loading up the forgejo repo on tangled to test page performance
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Refactor indexer (#25174)

Refactor `modules/indexer` to make it more maintainable. And it can be
easier to support more features. I'm trying to solve some of issue
searching, this is a precursor to making functional changes.

Current supported engines and the index versions:

| engines | issues | code |
| - | - | - |
| db | Just a wrapper for database queries, doesn't need version | - |
| bleve | The version of index is **2** | The version of index is **6**
|
| elasticsearch | The old index has no version, will be treated as
version **0** in this PR | The version of index is **1** |
| meilisearch | The old index has no version, will be treated as version
**0** in this PR | - |


## Changes

### Split

Splited it into mutiple packages

```text
indexer
├── internal
│   ├── bleve
│   ├── db
│   ├── elasticsearch
│   └── meilisearch
├── code
│   ├── bleve
│   ├── elasticsearch
│   └── internal
└── issues
├── bleve
├── db
├── elasticsearch
├── internal
└── meilisearch
```

- `indexer/interanal`: Internal shared package for indexer.
- `indexer/interanal/[engine]`: Internal shared package for each engine
(bleve/db/elasticsearch/meilisearch).
- `indexer/code`: Implementations for code indexer.
- `indexer/code/internal`: Internal shared package for code indexer.
- `indexer/code/[engine]`: Implementation via each engine for code
indexer.
- `indexer/issues`: Implementations for issues indexer.

### Deduplication

- Combine `Init/Ping/Close` for code indexer and issues indexer.
- ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to
`internal.IndexHolder`.~ Remove it, use dummy indexer instead when the
indexer is not ready.
- Duplicate two copies of creating ES clients.
- Duplicate two copies of `indexerID()`.


### Enhancement

- [x] Support index version for elasticsearch issues indexer, the old
index without version will be treated as version 0.
- [x] Fix spell of `elastic_search/ElasticSearch`, it should be
`Elasticsearch`.
- [x] Improve versioning of ES index. We don't need `Aliases`:
- Gitea does't need aliases for "Zero Downtime" because it never delete
old indexes.
- The old code of issues indexer uses the orignal name to create issue
index, so it's tricky to convert it to an alias.
- [x] Support index version for meilisearch issues indexer, the old
index without version will be treated as version 0.
- [x] Do "ping" only when `Ping` has been called, don't ping
periodically and cache the status.
- [x] Support the context parameter whenever possible.
- [x] Fix outdated example config.
- [x] Give up the requeue logic of issues indexer: When indexing fails,
call Ping to check if it was caused by the engine being unavailable, and
only requeue the task if the engine is unavailable.
- It is fragile and tricky, could cause data losing (It did happen when
I was doing some tests for this PR). And it works for ES only.
- Just always requeue the failed task, if it caused by bad data, it's a
bug of Gitea which should be fixed.

---------

Co-authored-by: Giteabot <teabot@gitea.io>

authored by

Jason Song
Giteabot
and committed by
GitHub
375fd15f b0215c40

+1836 -1888
+3 -3
custom/conf/app.example.ini
··· 1334 1334 ;; Issue indexer storage path, available when ISSUE_INDEXER_TYPE is bleve 1335 1335 ;ISSUE_INDEXER_PATH = indexers/issues.bleve ; Relative paths will be made absolute against _`AppWorkPath`_. 1336 1336 ;; 1337 - ;; Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch or meilisearch 1338 - ;ISSUE_INDEXER_CONN_STR = http://elastic:changeme@localhost:9200 1337 + ;; Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch (e.g. http://elastic:password@localhost:9200) or meilisearch (e.g. http://:apikey@localhost:7700) 1338 + ;ISSUE_INDEXER_CONN_STR = 1339 1339 ;; 1340 - ;; Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch 1340 + ;; Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch or meilisearch. 1341 1341 ;ISSUE_INDEXER_NAME = gitea_issues 1342 1342 ;; 1343 1343 ;; Timeout the indexer if it takes longer than this to start.
+3 -3
docs/content/doc/administration/config-cheat-sheet.en-us.md
··· 458 458 ## Indexer (`indexer`) 459 459 460 460 - `ISSUE_INDEXER_TYPE`: **bleve**: Issue indexer type, currently supported: `bleve`, `db`, `elasticsearch` or `meilisearch`. 461 - - `ISSUE_INDEXER_CONN_STR`: ****: Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch, or meilisearch. i.e. http://elastic:changeme@localhost:9200 462 - - `ISSUE_INDEXER_NAME`: **gitea_issues**: Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch 461 + - `ISSUE_INDEXER_CONN_STR`: ****: Issue indexer connection string, available when ISSUE_INDEXER_TYPE is elasticsearch (e.g. http://elastic:password@localhost:9200) or meilisearch (e.g. http://:apikey@localhost:7700) 462 + - `ISSUE_INDEXER_NAME`: **gitea_issues**: Issue indexer name, available when ISSUE_INDEXER_TYPE is elasticsearch or meilisearch. 463 463 - `ISSUE_INDEXER_PATH`: **indexers/issues.bleve**: Index file used for issue search; available when ISSUE_INDEXER_TYPE is bleve and elasticsearch. Relative paths will be made absolute against _`AppWorkPath`_. 464 464 465 465 - `REPO_INDEXER_ENABLED`: **false**: Enables code search (uses a lot of disk space, about 6 times more than the repository size). 466 466 - `REPO_INDEXER_REPO_TYPES`: **sources,forks,mirrors,templates**: Repo indexer units. The items to index could be `sources`, `forks`, `mirrors`, `templates` or any combination of them separated by a comma. If empty then it defaults to `sources` only, as if you'd like to disable fully please see `REPO_INDEXER_ENABLED`. 467 467 - `REPO_INDEXER_TYPE`: **bleve**: Code search engine type, could be `bleve` or `elasticsearch`. 468 468 - `REPO_INDEXER_PATH`: **indexers/repos.bleve**: Index file used for code search. 469 - - `REPO_INDEXER_CONN_STR`: ****: Code indexer connection string, available when `REPO_INDEXER_TYPE` is elasticsearch. i.e. http://elastic:changeme@localhost:9200 469 + - `REPO_INDEXER_CONN_STR`: ****: Code indexer connection string, available when `REPO_INDEXER_TYPE` is elasticsearch. i.e. http://elastic:password@localhost:9200 470 470 - `REPO_INDEXER_NAME`: **gitea_codes**: Code indexer name, available when `REPO_INDEXER_TYPE` is elasticsearch 471 471 472 472 - `REPO_INDEXER_INCLUDE`: **empty**: A comma separated list of glob patterns (see https://github.com/gobwas/glob) to **include** in the index. Use `**.txt` to match any files with .txt extension. An empty list means include all files.
+1 -1
modules/context/repo.go
··· 593 593 594 594 ctx.Data["RepoSearchEnabled"] = setting.Indexer.RepoIndexerEnabled 595 595 if setting.Indexer.RepoIndexerEnabled { 596 - ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable() 596 + ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable(ctx) 597 597 } 598 598 599 599 if ctx.IsSigned {
modules/indexer/bleve/batch.go modules/indexer/internal/bleve/batch.go
+36 -120
modules/indexer/code/bleve.go modules/indexer/code/bleve/bleve.go
··· 1 1 // Copyright 2019 The Gitea Authors. All rights reserved. 2 2 // SPDX-License-Identifier: MIT 3 3 4 - package code 4 + package bleve 5 5 6 6 import ( 7 7 "bufio" 8 8 "context" 9 9 "fmt" 10 10 "io" 11 - "os" 12 11 "strconv" 13 12 "strings" 14 13 "time" ··· 17 16 "code.gitea.io/gitea/modules/analyze" 18 17 "code.gitea.io/gitea/modules/charset" 19 18 "code.gitea.io/gitea/modules/git" 20 - gitea_bleve "code.gitea.io/gitea/modules/indexer/bleve" 19 + "code.gitea.io/gitea/modules/indexer/code/internal" 20 + indexer_internal "code.gitea.io/gitea/modules/indexer/internal" 21 + inner_bleve "code.gitea.io/gitea/modules/indexer/internal/bleve" 21 22 "code.gitea.io/gitea/modules/log" 22 23 "code.gitea.io/gitea/modules/setting" 23 24 "code.gitea.io/gitea/modules/timeutil" 24 25 "code.gitea.io/gitea/modules/typesniffer" 25 - "code.gitea.io/gitea/modules/util" 26 26 27 27 "github.com/blevesearch/bleve/v2" 28 28 analyzer_custom "github.com/blevesearch/bleve/v2/analysis/analyzer/custom" ··· 31 31 "github.com/blevesearch/bleve/v2/analysis/token/lowercase" 32 32 "github.com/blevesearch/bleve/v2/analysis/token/unicodenorm" 33 33 "github.com/blevesearch/bleve/v2/analysis/tokenizer/unicode" 34 - "github.com/blevesearch/bleve/v2/index/upsidedown" 35 34 "github.com/blevesearch/bleve/v2/mapping" 36 35 "github.com/blevesearch/bleve/v2/search/query" 37 - "github.com/ethantkoenig/rupture" 38 36 "github.com/go-enry/go-enry/v2" 39 37 ) 40 38 ··· 59 57 }) 60 58 } 61 59 62 - // openBleveIndexer open the index at the specified path, checking for metadata 63 - // updates and bleve version updates. If index needs to be created (or 64 - // re-created), returns (nil, nil) 65 - func openBleveIndexer(path string, latestVersion int) (bleve.Index, error) { 66 - _, err := os.Stat(path) 67 - if err != nil && os.IsNotExist(err) { 68 - return nil, nil 69 - } else if err != nil { 70 - return nil, err 71 - } 72 - 73 - metadata, err := rupture.ReadIndexMetadata(path) 74 - if err != nil { 75 - return nil, err 76 - } 77 - if metadata.Version < latestVersion { 78 - // the indexer is using a previous version, so we should delete it and 79 - // re-populate 80 - return nil, util.RemoveAll(path) 81 - } 82 - 83 - index, err := bleve.Open(path) 84 - if err != nil && err == upsidedown.IncompatibleVersion { 85 - // the indexer was built with a previous version of bleve, so we should 86 - // delete it and re-populate 87 - return nil, util.RemoveAll(path) 88 - } else if err != nil { 89 - return nil, err 90 - } 91 - return index, nil 92 - } 93 - 94 60 // RepoIndexerData data stored in the repo indexer 95 61 type RepoIndexerData struct { 96 62 RepoID int64 ··· 111 77 repoIndexerLatestVersion = 6 112 78 ) 113 79 114 - // createBleveIndexer create a bleve repo indexer if one does not already exist 115 - func createBleveIndexer(path string, latestVersion int) (bleve.Index, error) { 80 + // generateBleveIndexMapping generates a bleve index mapping for the repo indexer 81 + func generateBleveIndexMapping() (mapping.IndexMapping, error) { 116 82 docMapping := bleve.NewDocumentMapping() 117 83 numericFieldMapping := bleve.NewNumericFieldMapping() 118 84 numericFieldMapping.IncludeInAll = false ··· 147 113 mapping.AddDocumentMapping(repoIndexerDocType, docMapping) 148 114 mapping.AddDocumentMapping("_all", bleve.NewDocumentDisabledMapping()) 149 115 150 - indexer, err := bleve.New(path, mapping) 151 - if err != nil { 152 - return nil, err 153 - } 154 - 155 - if err = rupture.WriteIndexMetadata(path, &rupture.IndexMetadata{ 156 - Version: latestVersion, 157 - }); err != nil { 158 - return nil, err 159 - } 160 - return indexer, nil 116 + return mapping, nil 161 117 } 162 118 163 - var _ Indexer = &BleveIndexer{} 119 + var _ internal.Indexer = &Indexer{} 164 120 165 - // BleveIndexer represents a bleve indexer implementation 166 - type BleveIndexer struct { 167 - indexDir string 168 - indexer bleve.Index 121 + // Indexer represents a bleve indexer implementation 122 + type Indexer struct { 123 + inner *inner_bleve.Indexer 124 + indexer_internal.Indexer // do not composite inner_bleve.Indexer directly to avoid exposing too much 169 125 } 170 126 171 - // NewBleveIndexer creates a new bleve local indexer 172 - func NewBleveIndexer(indexDir string) (*BleveIndexer, bool, error) { 173 - indexer := &BleveIndexer{ 174 - indexDir: indexDir, 175 - } 176 - created, err := indexer.init() 177 - if err != nil { 178 - indexer.Close() 179 - return nil, false, err 127 + // NewIndexer creates a new bleve local indexer 128 + func NewIndexer(indexDir string) *Indexer { 129 + inner := inner_bleve.NewIndexer(indexDir, repoIndexerLatestVersion, generateBleveIndexMapping) 130 + return &Indexer{ 131 + Indexer: inner, 132 + inner: inner, 180 133 } 181 - return indexer, created, err 182 134 } 183 135 184 - func (b *BleveIndexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserError, batchReader *bufio.Reader, commitSha string, 185 - update fileUpdate, repo *repo_model.Repository, batch *gitea_bleve.FlushingBatch, 136 + func (b *Indexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserError, batchReader *bufio.Reader, commitSha string, 137 + update internal.FileUpdate, repo *repo_model.Repository, batch *inner_bleve.FlushingBatch, 186 138 ) error { 187 139 // Ignore vendored files in code search 188 140 if setting.Indexer.ExcludeVendored && analyze.IsVendor(update.Filename) { ··· 227 179 if _, err = batchReader.Discard(1); err != nil { 228 180 return err 229 181 } 230 - id := filenameIndexerID(repo.ID, update.Filename) 182 + id := internal.FilenameIndexerID(repo.ID, update.Filename) 231 183 return batch.Index(id, &RepoIndexerData{ 232 184 RepoID: repo.ID, 233 185 CommitID: commitSha, ··· 237 189 }) 238 190 } 239 191 240 - func (b *BleveIndexer) addDelete(filename string, repo *repo_model.Repository, batch *gitea_bleve.FlushingBatch) error { 241 - id := filenameIndexerID(repo.ID, filename) 192 + func (b *Indexer) addDelete(filename string, repo *repo_model.Repository, batch *inner_bleve.FlushingBatch) error { 193 + id := internal.FilenameIndexerID(repo.ID, filename) 242 194 return batch.Delete(id) 243 195 } 244 196 245 - // init init the indexer 246 - func (b *BleveIndexer) init() (bool, error) { 247 - var err error 248 - b.indexer, err = openBleveIndexer(b.indexDir, repoIndexerLatestVersion) 249 - if err != nil { 250 - return false, err 251 - } 252 - if b.indexer != nil { 253 - return false, nil 254 - } 255 - 256 - b.indexer, err = createBleveIndexer(b.indexDir, repoIndexerLatestVersion) 257 - if err != nil { 258 - return false, err 259 - } 260 - 261 - return true, nil 262 - } 263 - 264 - // Close close the indexer 265 - func (b *BleveIndexer) Close() { 266 - log.Debug("Closing repo indexer") 267 - if b.indexer != nil { 268 - err := b.indexer.Close() 269 - if err != nil { 270 - log.Error("Error whilst closing the repository indexer: %v", err) 271 - } 272 - } 273 - log.Info("PID: %d Repository Indexer closed", os.Getpid()) 274 - } 275 - 276 - // Ping does nothing 277 - func (b *BleveIndexer) Ping() bool { 278 - return true 279 - } 280 - 281 197 // Index indexes the data 282 - func (b *BleveIndexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *repoChanges) error { 283 - batch := gitea_bleve.NewFlushingBatch(b.indexer, maxBatchSize) 198 + func (b *Indexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *internal.RepoChanges) error { 199 + batch := inner_bleve.NewFlushingBatch(b.inner.Indexer, maxBatchSize) 284 200 if len(changes.Updates) > 0 { 285 201 286 202 // Now because of some insanity with git cat-file not immediately failing if not run in a valid git directory we need to run git rev-parse first! ··· 308 224 } 309 225 310 226 // Delete deletes indexes by ids 311 - func (b *BleveIndexer) Delete(repoID int64) error { 227 + func (b *Indexer) Delete(_ context.Context, repoID int64) error { 312 228 query := numericEqualityQuery(repoID, "RepoID") 313 229 searchRequest := bleve.NewSearchRequestOptions(query, 2147483647, 0, false) 314 - result, err := b.indexer.Search(searchRequest) 230 + result, err := b.inner.Indexer.Search(searchRequest) 315 231 if err != nil { 316 232 return err 317 233 } 318 - batch := gitea_bleve.NewFlushingBatch(b.indexer, maxBatchSize) 234 + batch := inner_bleve.NewFlushingBatch(b.inner.Indexer, maxBatchSize) 319 235 for _, hit := range result.Hits { 320 236 if err = batch.Delete(hit.ID); err != nil { 321 237 return err ··· 326 242 327 243 // Search searches for files in the specified repo. 328 244 // Returns the matching file-paths 329 - func (b *BleveIndexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*SearchResult, []*SearchResultLanguages, error) { 245 + func (b *Indexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*internal.SearchResult, []*internal.SearchResultLanguages, error) { 330 246 var ( 331 247 indexerQuery query.Query 332 248 keywordQuery query.Query ··· 379 295 searchRequest.AddFacet("languages", bleve.NewFacetRequest("Language", 10)) 380 296 } 381 297 382 - result, err := b.indexer.SearchInContext(ctx, searchRequest) 298 + result, err := b.inner.Indexer.SearchInContext(ctx, searchRequest) 383 299 if err != nil { 384 300 return 0, nil, nil, err 385 301 } 386 302 387 303 total := int64(result.Total) 388 304 389 - searchResults := make([]*SearchResult, len(result.Hits)) 305 + searchResults := make([]*internal.SearchResult, len(result.Hits)) 390 306 for i, hit := range result.Hits { 391 307 startIndex, endIndex := -1, -1 392 308 for _, locations := range hit.Locations["Content"] { ··· 405 321 if t, err := time.Parse(time.RFC3339, hit.Fields["UpdatedAt"].(string)); err == nil { 406 322 updatedUnix = timeutil.TimeStamp(t.Unix()) 407 323 } 408 - searchResults[i] = &SearchResult{ 324 + searchResults[i] = &internal.SearchResult{ 409 325 RepoID: int64(hit.Fields["RepoID"].(float64)), 410 326 StartIndex: startIndex, 411 327 EndIndex: endIndex, 412 - Filename: filenameOfIndexerID(hit.ID), 328 + Filename: internal.FilenameOfIndexerID(hit.ID), 413 329 Content: hit.Fields["Content"].(string), 414 330 CommitID: hit.Fields["CommitID"].(string), 415 331 UpdatedUnix: updatedUnix, ··· 418 334 } 419 335 } 420 336 421 - searchResultLanguages := make([]*SearchResultLanguages, 0, 10) 337 + searchResultLanguages := make([]*internal.SearchResultLanguages, 0, 10) 422 338 if len(language) > 0 { 423 339 // Use separate query to go get all language counts 424 340 facetRequest := bleve.NewSearchRequestOptions(facetQuery, 1, 0, false) ··· 426 342 facetRequest.IncludeLocations = true 427 343 facetRequest.AddFacet("languages", bleve.NewFacetRequest("Language", 10)) 428 344 429 - if result, err = b.indexer.Search(facetRequest); err != nil { 345 + if result, err = b.inner.Indexer.Search(facetRequest); err != nil { 430 346 return 0, nil, nil, err 431 347 } 432 348 ··· 436 352 if len(term.Term) == 0 { 437 353 continue 438 354 } 439 - searchResultLanguages = append(searchResultLanguages, &SearchResultLanguages{ 355 + searchResultLanguages = append(searchResultLanguages, &internal.SearchResultLanguages{ 440 356 Language: term.Term, 441 357 Color: enry.GetColor(term.Term), 442 358 Count: term.Count,
-30
modules/indexer/code/bleve_test.go
··· 1 - // Copyright 2019 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package code 5 - 6 - import ( 7 - "testing" 8 - 9 - "code.gitea.io/gitea/models/unittest" 10 - 11 - "github.com/stretchr/testify/assert" 12 - ) 13 - 14 - func TestBleveIndexAndSearch(t *testing.T) { 15 - unittest.PrepareTestEnv(t) 16 - 17 - dir := t.TempDir() 18 - 19 - idx, _, err := NewBleveIndexer(dir) 20 - if err != nil { 21 - assert.Fail(t, "Unable to create bleve indexer Error: %v", err) 22 - if idx != nil { 23 - idx.Close() 24 - } 25 - return 26 - } 27 - defer idx.Close() 28 - 29 - testIndexer("beleve", t, idx) 30 - }
-512
modules/indexer/code/elastic_search.go
··· 1 - // Copyright 2020 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package code 5 - 6 - import ( 7 - "bufio" 8 - "context" 9 - "errors" 10 - "fmt" 11 - "io" 12 - "net" 13 - "strconv" 14 - "strings" 15 - "sync" 16 - "time" 17 - 18 - repo_model "code.gitea.io/gitea/models/repo" 19 - "code.gitea.io/gitea/modules/analyze" 20 - "code.gitea.io/gitea/modules/charset" 21 - "code.gitea.io/gitea/modules/git" 22 - "code.gitea.io/gitea/modules/graceful" 23 - "code.gitea.io/gitea/modules/json" 24 - "code.gitea.io/gitea/modules/log" 25 - "code.gitea.io/gitea/modules/setting" 26 - "code.gitea.io/gitea/modules/timeutil" 27 - "code.gitea.io/gitea/modules/typesniffer" 28 - 29 - "github.com/go-enry/go-enry/v2" 30 - "github.com/olivere/elastic/v7" 31 - ) 32 - 33 - const ( 34 - esRepoIndexerLatestVersion = 1 35 - // multi-match-types, currently only 2 types are used 36 - // Reference: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-multi-match-query.html#multi-match-types 37 - esMultiMatchTypeBestFields = "best_fields" 38 - esMultiMatchTypePhrasePrefix = "phrase_prefix" 39 - ) 40 - 41 - var _ Indexer = &ElasticSearchIndexer{} 42 - 43 - // ElasticSearchIndexer implements Indexer interface 44 - type ElasticSearchIndexer struct { 45 - client *elastic.Client 46 - indexerAliasName string 47 - available bool 48 - stopTimer chan struct{} 49 - lock sync.RWMutex 50 - } 51 - 52 - // NewElasticSearchIndexer creates a new elasticsearch indexer 53 - func NewElasticSearchIndexer(url, indexerName string) (*ElasticSearchIndexer, bool, error) { 54 - opts := []elastic.ClientOptionFunc{ 55 - elastic.SetURL(url), 56 - elastic.SetSniff(false), 57 - elastic.SetHealthcheckInterval(10 * time.Second), 58 - elastic.SetGzip(false), 59 - } 60 - 61 - logger := log.GetLogger(log.DEFAULT) 62 - 63 - opts = append(opts, elastic.SetTraceLog(&log.PrintfLogger{Logf: logger.Trace})) 64 - opts = append(opts, elastic.SetInfoLog(&log.PrintfLogger{Logf: logger.Info})) 65 - opts = append(opts, elastic.SetErrorLog(&log.PrintfLogger{Logf: logger.Error})) 66 - 67 - client, err := elastic.NewClient(opts...) 68 - if err != nil { 69 - return nil, false, err 70 - } 71 - 72 - indexer := &ElasticSearchIndexer{ 73 - client: client, 74 - indexerAliasName: indexerName, 75 - available: true, 76 - stopTimer: make(chan struct{}), 77 - } 78 - 79 - ticker := time.NewTicker(10 * time.Second) 80 - go func() { 81 - for { 82 - select { 83 - case <-ticker.C: 84 - indexer.checkAvailability() 85 - case <-indexer.stopTimer: 86 - ticker.Stop() 87 - return 88 - } 89 - } 90 - }() 91 - 92 - exists, err := indexer.init() 93 - if err != nil { 94 - indexer.Close() 95 - return nil, false, err 96 - } 97 - return indexer, !exists, err 98 - } 99 - 100 - const ( 101 - defaultMapping = `{ 102 - "mappings": { 103 - "properties": { 104 - "repo_id": { 105 - "type": "long", 106 - "index": true 107 - }, 108 - "content": { 109 - "type": "text", 110 - "term_vector": "with_positions_offsets", 111 - "index": true 112 - }, 113 - "commit_id": { 114 - "type": "keyword", 115 - "index": true 116 - }, 117 - "language": { 118 - "type": "keyword", 119 - "index": true 120 - }, 121 - "updated_at": { 122 - "type": "long", 123 - "index": true 124 - } 125 - } 126 - } 127 - }` 128 - ) 129 - 130 - func (b *ElasticSearchIndexer) realIndexerName() string { 131 - return fmt.Sprintf("%s.v%d", b.indexerAliasName, esRepoIndexerLatestVersion) 132 - } 133 - 134 - // Init will initialize the indexer 135 - func (b *ElasticSearchIndexer) init() (bool, error) { 136 - ctx := graceful.GetManager().HammerContext() 137 - exists, err := b.client.IndexExists(b.realIndexerName()).Do(ctx) 138 - if err != nil { 139 - return false, b.checkError(err) 140 - } 141 - if !exists { 142 - mapping := defaultMapping 143 - 144 - createIndex, err := b.client.CreateIndex(b.realIndexerName()).BodyString(mapping).Do(ctx) 145 - if err != nil { 146 - return false, b.checkError(err) 147 - } 148 - if !createIndex.Acknowledged { 149 - return false, fmt.Errorf("create index %s with %s failed", b.realIndexerName(), mapping) 150 - } 151 - } 152 - 153 - // check version 154 - r, err := b.client.Aliases().Do(ctx) 155 - if err != nil { 156 - return false, b.checkError(err) 157 - } 158 - 159 - realIndexerNames := r.IndicesByAlias(b.indexerAliasName) 160 - if len(realIndexerNames) < 1 { 161 - res, err := b.client.Alias(). 162 - Add(b.realIndexerName(), b.indexerAliasName). 163 - Do(ctx) 164 - if err != nil { 165 - return false, b.checkError(err) 166 - } 167 - if !res.Acknowledged { 168 - return false, fmt.Errorf("create alias %s to index %s failed", b.indexerAliasName, b.realIndexerName()) 169 - } 170 - } else if len(realIndexerNames) >= 1 && realIndexerNames[0] < b.realIndexerName() { 171 - log.Warn("Found older gitea indexer named %s, but we will create a new one %s and keep the old NOT DELETED. You can delete the old version after the upgrade succeed.", 172 - realIndexerNames[0], b.realIndexerName()) 173 - res, err := b.client.Alias(). 174 - Remove(realIndexerNames[0], b.indexerAliasName). 175 - Add(b.realIndexerName(), b.indexerAliasName). 176 - Do(ctx) 177 - if err != nil { 178 - return false, b.checkError(err) 179 - } 180 - if !res.Acknowledged { 181 - return false, fmt.Errorf("change alias %s to index %s failed", b.indexerAliasName, b.realIndexerName()) 182 - } 183 - } 184 - 185 - return exists, nil 186 - } 187 - 188 - // Ping checks if elastic is available 189 - func (b *ElasticSearchIndexer) Ping() bool { 190 - b.lock.RLock() 191 - defer b.lock.RUnlock() 192 - return b.available 193 - } 194 - 195 - func (b *ElasticSearchIndexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserError, batchReader *bufio.Reader, sha string, update fileUpdate, repo *repo_model.Repository) ([]elastic.BulkableRequest, error) { 196 - // Ignore vendored files in code search 197 - if setting.Indexer.ExcludeVendored && analyze.IsVendor(update.Filename) { 198 - return nil, nil 199 - } 200 - 201 - size := update.Size 202 - var err error 203 - if !update.Sized { 204 - var stdout string 205 - stdout, _, err = git.NewCommand(ctx, "cat-file", "-s").AddDynamicArguments(update.BlobSha).RunStdString(&git.RunOpts{Dir: repo.RepoPath()}) 206 - if err != nil { 207 - return nil, err 208 - } 209 - if size, err = strconv.ParseInt(strings.TrimSpace(stdout), 10, 64); err != nil { 210 - return nil, fmt.Errorf("misformatted git cat-file output: %w", err) 211 - } 212 - } 213 - 214 - if size > setting.Indexer.MaxIndexerFileSize { 215 - return []elastic.BulkableRequest{b.addDelete(update.Filename, repo)}, nil 216 - } 217 - 218 - if _, err := batchWriter.Write([]byte(update.BlobSha + "\n")); err != nil { 219 - return nil, err 220 - } 221 - 222 - _, _, size, err = git.ReadBatchLine(batchReader) 223 - if err != nil { 224 - return nil, err 225 - } 226 - 227 - fileContents, err := io.ReadAll(io.LimitReader(batchReader, size)) 228 - if err != nil { 229 - return nil, err 230 - } else if !typesniffer.DetectContentType(fileContents).IsText() { 231 - // FIXME: UTF-16 files will probably fail here 232 - return nil, nil 233 - } 234 - 235 - if _, err = batchReader.Discard(1); err != nil { 236 - return nil, err 237 - } 238 - id := filenameIndexerID(repo.ID, update.Filename) 239 - 240 - return []elastic.BulkableRequest{ 241 - elastic.NewBulkIndexRequest(). 242 - Index(b.indexerAliasName). 243 - Id(id). 244 - Doc(map[string]interface{}{ 245 - "repo_id": repo.ID, 246 - "content": string(charset.ToUTF8DropErrors(fileContents)), 247 - "commit_id": sha, 248 - "language": analyze.GetCodeLanguage(update.Filename, fileContents), 249 - "updated_at": timeutil.TimeStampNow(), 250 - }), 251 - }, nil 252 - } 253 - 254 - func (b *ElasticSearchIndexer) addDelete(filename string, repo *repo_model.Repository) elastic.BulkableRequest { 255 - id := filenameIndexerID(repo.ID, filename) 256 - return elastic.NewBulkDeleteRequest(). 257 - Index(b.indexerAliasName). 258 - Id(id) 259 - } 260 - 261 - // Index will save the index data 262 - func (b *ElasticSearchIndexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *repoChanges) error { 263 - reqs := make([]elastic.BulkableRequest, 0) 264 - if len(changes.Updates) > 0 { 265 - // Now because of some insanity with git cat-file not immediately failing if not run in a valid git directory we need to run git rev-parse first! 266 - if err := git.EnsureValidGitRepository(ctx, repo.RepoPath()); err != nil { 267 - log.Error("Unable to open git repo: %s for %-v: %v", repo.RepoPath(), repo, err) 268 - return err 269 - } 270 - 271 - batchWriter, batchReader, cancel := git.CatFileBatch(ctx, repo.RepoPath()) 272 - defer cancel() 273 - 274 - for _, update := range changes.Updates { 275 - updateReqs, err := b.addUpdate(ctx, batchWriter, batchReader, sha, update, repo) 276 - if err != nil { 277 - return err 278 - } 279 - if len(updateReqs) > 0 { 280 - reqs = append(reqs, updateReqs...) 281 - } 282 - } 283 - cancel() 284 - } 285 - 286 - for _, filename := range changes.RemovedFilenames { 287 - reqs = append(reqs, b.addDelete(filename, repo)) 288 - } 289 - 290 - if len(reqs) > 0 { 291 - _, err := b.client.Bulk(). 292 - Index(b.indexerAliasName). 293 - Add(reqs...). 294 - Do(ctx) 295 - return b.checkError(err) 296 - } 297 - return nil 298 - } 299 - 300 - // Delete deletes indexes by ids 301 - func (b *ElasticSearchIndexer) Delete(repoID int64) error { 302 - _, err := b.client.DeleteByQuery(b.indexerAliasName). 303 - Query(elastic.NewTermsQuery("repo_id", repoID)). 304 - Do(graceful.GetManager().HammerContext()) 305 - return b.checkError(err) 306 - } 307 - 308 - // indexPos find words positions for start and the following end on content. It will 309 - // return the beginning position of the first start and the ending position of the 310 - // first end following the start string. 311 - // If not found any of the positions, it will return -1, -1. 312 - func indexPos(content, start, end string) (int, int) { 313 - startIdx := strings.Index(content, start) 314 - if startIdx < 0 { 315 - return -1, -1 316 - } 317 - endIdx := strings.Index(content[startIdx+len(start):], end) 318 - if endIdx < 0 { 319 - return -1, -1 320 - } 321 - return startIdx, startIdx + len(start) + endIdx + len(end) 322 - } 323 - 324 - func convertResult(searchResult *elastic.SearchResult, kw string, pageSize int) (int64, []*SearchResult, []*SearchResultLanguages, error) { 325 - hits := make([]*SearchResult, 0, pageSize) 326 - for _, hit := range searchResult.Hits.Hits { 327 - // FIXME: There is no way to get the position the keyword on the content currently on the same request. 328 - // So we get it from content, this may made the query slower. See 329 - // https://discuss.elastic.co/t/fetching-position-of-keyword-in-matched-document/94291 330 - var startIndex, endIndex int 331 - c, ok := hit.Highlight["content"] 332 - if ok && len(c) > 0 { 333 - // FIXME: Since the highlighting content will include <em> and </em> for the keywords, 334 - // now we should find the positions. But how to avoid html content which contains the 335 - // <em> and </em> tags? If elastic search has handled that? 336 - startIndex, endIndex = indexPos(c[0], "<em>", "</em>") 337 - if startIndex == -1 { 338 - panic(fmt.Sprintf("1===%s,,,%#v,,,%s", kw, hit.Highlight, c[0])) 339 - } 340 - } else { 341 - panic(fmt.Sprintf("2===%#v", hit.Highlight)) 342 - } 343 - 344 - repoID, fileName := parseIndexerID(hit.Id) 345 - res := make(map[string]interface{}) 346 - if err := json.Unmarshal(hit.Source, &res); err != nil { 347 - return 0, nil, nil, err 348 - } 349 - 350 - language := res["language"].(string) 351 - 352 - hits = append(hits, &SearchResult{ 353 - RepoID: repoID, 354 - Filename: fileName, 355 - CommitID: res["commit_id"].(string), 356 - Content: res["content"].(string), 357 - UpdatedUnix: timeutil.TimeStamp(res["updated_at"].(float64)), 358 - Language: language, 359 - StartIndex: startIndex, 360 - EndIndex: endIndex - 9, // remove the length <em></em> since we give Content the original data 361 - Color: enry.GetColor(language), 362 - }) 363 - } 364 - 365 - return searchResult.TotalHits(), hits, extractAggs(searchResult), nil 366 - } 367 - 368 - func extractAggs(searchResult *elastic.SearchResult) []*SearchResultLanguages { 369 - var searchResultLanguages []*SearchResultLanguages 370 - agg, found := searchResult.Aggregations.Terms("language") 371 - if found { 372 - searchResultLanguages = make([]*SearchResultLanguages, 0, 10) 373 - 374 - for _, bucket := range agg.Buckets { 375 - searchResultLanguages = append(searchResultLanguages, &SearchResultLanguages{ 376 - Language: bucket.Key.(string), 377 - Color: enry.GetColor(bucket.Key.(string)), 378 - Count: int(bucket.DocCount), 379 - }) 380 - } 381 - } 382 - return searchResultLanguages 383 - } 384 - 385 - // Search searches for codes and language stats by given conditions. 386 - func (b *ElasticSearchIndexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*SearchResult, []*SearchResultLanguages, error) { 387 - searchType := esMultiMatchTypeBestFields 388 - if isMatch { 389 - searchType = esMultiMatchTypePhrasePrefix 390 - } 391 - 392 - kwQuery := elastic.NewMultiMatchQuery(keyword, "content").Type(searchType) 393 - query := elastic.NewBoolQuery() 394 - query = query.Must(kwQuery) 395 - if len(repoIDs) > 0 { 396 - repoStrs := make([]interface{}, 0, len(repoIDs)) 397 - for _, repoID := range repoIDs { 398 - repoStrs = append(repoStrs, repoID) 399 - } 400 - repoQuery := elastic.NewTermsQuery("repo_id", repoStrs...) 401 - query = query.Must(repoQuery) 402 - } 403 - 404 - var ( 405 - start int 406 - kw = "<em>" + keyword + "</em>" 407 - aggregation = elastic.NewTermsAggregation().Field("language").Size(10).OrderByCountDesc() 408 - ) 409 - 410 - if page > 0 { 411 - start = (page - 1) * pageSize 412 - } 413 - 414 - if len(language) == 0 { 415 - searchResult, err := b.client.Search(). 416 - Index(b.indexerAliasName). 417 - Aggregation("language", aggregation). 418 - Query(query). 419 - Highlight( 420 - elastic.NewHighlight(). 421 - Field("content"). 422 - NumOfFragments(0). // return all highting content on fragments 423 - HighlighterType("fvh"), 424 - ). 425 - Sort("repo_id", true). 426 - From(start).Size(pageSize). 427 - Do(ctx) 428 - if err != nil { 429 - return 0, nil, nil, b.checkError(err) 430 - } 431 - 432 - return convertResult(searchResult, kw, pageSize) 433 - } 434 - 435 - langQuery := elastic.NewMatchQuery("language", language) 436 - countResult, err := b.client.Search(). 437 - Index(b.indexerAliasName). 438 - Aggregation("language", aggregation). 439 - Query(query). 440 - Size(0). // We only needs stats information 441 - Do(ctx) 442 - if err != nil { 443 - return 0, nil, nil, b.checkError(err) 444 - } 445 - 446 - query = query.Must(langQuery) 447 - searchResult, err := b.client.Search(). 448 - Index(b.indexerAliasName). 449 - Query(query). 450 - Highlight( 451 - elastic.NewHighlight(). 452 - Field("content"). 453 - NumOfFragments(0). // return all highting content on fragments 454 - HighlighterType("fvh"), 455 - ). 456 - Sort("repo_id", true). 457 - From(start).Size(pageSize). 458 - Do(ctx) 459 - if err != nil { 460 - return 0, nil, nil, b.checkError(err) 461 - } 462 - 463 - total, hits, _, err := convertResult(searchResult, kw, pageSize) 464 - 465 - return total, hits, extractAggs(countResult), err 466 - } 467 - 468 - // Close implements indexer 469 - func (b *ElasticSearchIndexer) Close() { 470 - select { 471 - case <-b.stopTimer: 472 - default: 473 - close(b.stopTimer) 474 - } 475 - } 476 - 477 - func (b *ElasticSearchIndexer) checkError(err error) error { 478 - var opErr *net.OpError 479 - if !(elastic.IsConnErr(err) || (errors.As(err, &opErr) && (opErr.Op == "dial" || opErr.Op == "read"))) { 480 - return err 481 - } 482 - 483 - b.setAvailability(false) 484 - 485 - return err 486 - } 487 - 488 - func (b *ElasticSearchIndexer) checkAvailability() { 489 - if b.Ping() { 490 - return 491 - } 492 - 493 - // Request cluster state to check if elastic is available again 494 - _, err := b.client.ClusterState().Do(graceful.GetManager().ShutdownContext()) 495 - if err != nil { 496 - b.setAvailability(false) 497 - return 498 - } 499 - 500 - b.setAvailability(true) 501 - } 502 - 503 - func (b *ElasticSearchIndexer) setAvailability(available bool) { 504 - b.lock.Lock() 505 - defer b.lock.Unlock() 506 - 507 - if b.available == available { 508 - return 509 - } 510 - 511 - b.available = available 512 - }
-41
modules/indexer/code/elastic_search_test.go
··· 1 - // Copyright 2020 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package code 5 - 6 - import ( 7 - "os" 8 - "testing" 9 - 10 - "code.gitea.io/gitea/models/unittest" 11 - 12 - "github.com/stretchr/testify/assert" 13 - ) 14 - 15 - func TestESIndexAndSearch(t *testing.T) { 16 - unittest.PrepareTestEnv(t) 17 - 18 - u := os.Getenv("TEST_INDEXER_CODE_ES_URL") 19 - if u == "" { 20 - t.SkipNow() 21 - return 22 - } 23 - 24 - indexer, _, err := NewElasticSearchIndexer(u, "gitea_codes") 25 - if err != nil { 26 - assert.Fail(t, "Unable to create ES indexer Error: %v", err) 27 - if indexer != nil { 28 - indexer.Close() 29 - } 30 - return 31 - } 32 - defer indexer.Close() 33 - 34 - testIndexer("elastic_search", t, indexer) 35 - } 36 - 37 - func TestIndexPos(t *testing.T) { 38 - startIdx, endIdx := indexPos("test index start and end", "start", "end") 39 - assert.EqualValues(t, 11, startIdx) 40 - assert.EqualValues(t, 24, endIdx) 41 - }
+358
modules/indexer/code/elasticsearch/elasticsearch.go
··· 1 + // Copyright 2020 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package elasticsearch 5 + 6 + import ( 7 + "bufio" 8 + "context" 9 + "fmt" 10 + "io" 11 + "strconv" 12 + "strings" 13 + 14 + repo_model "code.gitea.io/gitea/models/repo" 15 + "code.gitea.io/gitea/modules/analyze" 16 + "code.gitea.io/gitea/modules/charset" 17 + "code.gitea.io/gitea/modules/git" 18 + "code.gitea.io/gitea/modules/indexer/code/internal" 19 + indexer_internal "code.gitea.io/gitea/modules/indexer/internal" 20 + inner_elasticsearch "code.gitea.io/gitea/modules/indexer/internal/elasticsearch" 21 + "code.gitea.io/gitea/modules/json" 22 + "code.gitea.io/gitea/modules/log" 23 + "code.gitea.io/gitea/modules/setting" 24 + "code.gitea.io/gitea/modules/timeutil" 25 + "code.gitea.io/gitea/modules/typesniffer" 26 + 27 + "github.com/go-enry/go-enry/v2" 28 + "github.com/olivere/elastic/v7" 29 + ) 30 + 31 + const ( 32 + esRepoIndexerLatestVersion = 1 33 + // multi-match-types, currently only 2 types are used 34 + // Reference: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-multi-match-query.html#multi-match-types 35 + esMultiMatchTypeBestFields = "best_fields" 36 + esMultiMatchTypePhrasePrefix = "phrase_prefix" 37 + ) 38 + 39 + var _ internal.Indexer = &Indexer{} 40 + 41 + // Indexer implements Indexer interface 42 + type Indexer struct { 43 + inner *inner_elasticsearch.Indexer 44 + indexer_internal.Indexer // do not composite inner_elasticsearch.Indexer directly to avoid exposing too much 45 + } 46 + 47 + // NewIndexer creates a new elasticsearch indexer 48 + func NewIndexer(url, indexerName string) *Indexer { 49 + inner := inner_elasticsearch.NewIndexer(url, indexerName, esRepoIndexerLatestVersion, defaultMapping) 50 + indexer := &Indexer{ 51 + inner: inner, 52 + Indexer: inner, 53 + } 54 + return indexer 55 + } 56 + 57 + const ( 58 + defaultMapping = `{ 59 + "mappings": { 60 + "properties": { 61 + "repo_id": { 62 + "type": "long", 63 + "index": true 64 + }, 65 + "content": { 66 + "type": "text", 67 + "term_vector": "with_positions_offsets", 68 + "index": true 69 + }, 70 + "commit_id": { 71 + "type": "keyword", 72 + "index": true 73 + }, 74 + "language": { 75 + "type": "keyword", 76 + "index": true 77 + }, 78 + "updated_at": { 79 + "type": "long", 80 + "index": true 81 + } 82 + } 83 + } 84 + }` 85 + ) 86 + 87 + func (b *Indexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserError, batchReader *bufio.Reader, sha string, update internal.FileUpdate, repo *repo_model.Repository) ([]elastic.BulkableRequest, error) { 88 + // Ignore vendored files in code search 89 + if setting.Indexer.ExcludeVendored && analyze.IsVendor(update.Filename) { 90 + return nil, nil 91 + } 92 + 93 + size := update.Size 94 + var err error 95 + if !update.Sized { 96 + var stdout string 97 + stdout, _, err = git.NewCommand(ctx, "cat-file", "-s").AddDynamicArguments(update.BlobSha).RunStdString(&git.RunOpts{Dir: repo.RepoPath()}) 98 + if err != nil { 99 + return nil, err 100 + } 101 + if size, err = strconv.ParseInt(strings.TrimSpace(stdout), 10, 64); err != nil { 102 + return nil, fmt.Errorf("misformatted git cat-file output: %w", err) 103 + } 104 + } 105 + 106 + if size > setting.Indexer.MaxIndexerFileSize { 107 + return []elastic.BulkableRequest{b.addDelete(update.Filename, repo)}, nil 108 + } 109 + 110 + if _, err := batchWriter.Write([]byte(update.BlobSha + "\n")); err != nil { 111 + return nil, err 112 + } 113 + 114 + _, _, size, err = git.ReadBatchLine(batchReader) 115 + if err != nil { 116 + return nil, err 117 + } 118 + 119 + fileContents, err := io.ReadAll(io.LimitReader(batchReader, size)) 120 + if err != nil { 121 + return nil, err 122 + } else if !typesniffer.DetectContentType(fileContents).IsText() { 123 + // FIXME: UTF-16 files will probably fail here 124 + return nil, nil 125 + } 126 + 127 + if _, err = batchReader.Discard(1); err != nil { 128 + return nil, err 129 + } 130 + id := internal.FilenameIndexerID(repo.ID, update.Filename) 131 + 132 + return []elastic.BulkableRequest{ 133 + elastic.NewBulkIndexRequest(). 134 + Index(b.inner.VersionedIndexName()). 135 + Id(id). 136 + Doc(map[string]interface{}{ 137 + "repo_id": repo.ID, 138 + "content": string(charset.ToUTF8DropErrors(fileContents)), 139 + "commit_id": sha, 140 + "language": analyze.GetCodeLanguage(update.Filename, fileContents), 141 + "updated_at": timeutil.TimeStampNow(), 142 + }), 143 + }, nil 144 + } 145 + 146 + func (b *Indexer) addDelete(filename string, repo *repo_model.Repository) elastic.BulkableRequest { 147 + id := internal.FilenameIndexerID(repo.ID, filename) 148 + return elastic.NewBulkDeleteRequest(). 149 + Index(b.inner.VersionedIndexName()). 150 + Id(id) 151 + } 152 + 153 + // Index will save the index data 154 + func (b *Indexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *internal.RepoChanges) error { 155 + reqs := make([]elastic.BulkableRequest, 0) 156 + if len(changes.Updates) > 0 { 157 + // Now because of some insanity with git cat-file not immediately failing if not run in a valid git directory we need to run git rev-parse first! 158 + if err := git.EnsureValidGitRepository(ctx, repo.RepoPath()); err != nil { 159 + log.Error("Unable to open git repo: %s for %-v: %v", repo.RepoPath(), repo, err) 160 + return err 161 + } 162 + 163 + batchWriter, batchReader, cancel := git.CatFileBatch(ctx, repo.RepoPath()) 164 + defer cancel() 165 + 166 + for _, update := range changes.Updates { 167 + updateReqs, err := b.addUpdate(ctx, batchWriter, batchReader, sha, update, repo) 168 + if err != nil { 169 + return err 170 + } 171 + if len(updateReqs) > 0 { 172 + reqs = append(reqs, updateReqs...) 173 + } 174 + } 175 + cancel() 176 + } 177 + 178 + for _, filename := range changes.RemovedFilenames { 179 + reqs = append(reqs, b.addDelete(filename, repo)) 180 + } 181 + 182 + if len(reqs) > 0 { 183 + _, err := b.inner.Client.Bulk(). 184 + Index(b.inner.VersionedIndexName()). 185 + Add(reqs...). 186 + Do(ctx) 187 + return err 188 + } 189 + return nil 190 + } 191 + 192 + // Delete deletes indexes by ids 193 + func (b *Indexer) Delete(ctx context.Context, repoID int64) error { 194 + _, err := b.inner.Client.DeleteByQuery(b.inner.VersionedIndexName()). 195 + Query(elastic.NewTermsQuery("repo_id", repoID)). 196 + Do(ctx) 197 + return err 198 + } 199 + 200 + // indexPos find words positions for start and the following end on content. It will 201 + // return the beginning position of the first start and the ending position of the 202 + // first end following the start string. 203 + // If not found any of the positions, it will return -1, -1. 204 + func indexPos(content, start, end string) (int, int) { 205 + startIdx := strings.Index(content, start) 206 + if startIdx < 0 { 207 + return -1, -1 208 + } 209 + endIdx := strings.Index(content[startIdx+len(start):], end) 210 + if endIdx < 0 { 211 + return -1, -1 212 + } 213 + return startIdx, startIdx + len(start) + endIdx + len(end) 214 + } 215 + 216 + func convertResult(searchResult *elastic.SearchResult, kw string, pageSize int) (int64, []*internal.SearchResult, []*internal.SearchResultLanguages, error) { 217 + hits := make([]*internal.SearchResult, 0, pageSize) 218 + for _, hit := range searchResult.Hits.Hits { 219 + // FIXME: There is no way to get the position the keyword on the content currently on the same request. 220 + // So we get it from content, this may made the query slower. See 221 + // https://discuss.elastic.co/t/fetching-position-of-keyword-in-matched-document/94291 222 + var startIndex, endIndex int 223 + c, ok := hit.Highlight["content"] 224 + if ok && len(c) > 0 { 225 + // FIXME: Since the highlighting content will include <em> and </em> for the keywords, 226 + // now we should find the positions. But how to avoid html content which contains the 227 + // <em> and </em> tags? If elastic search has handled that? 228 + startIndex, endIndex = indexPos(c[0], "<em>", "</em>") 229 + if startIndex == -1 { 230 + panic(fmt.Sprintf("1===%s,,,%#v,,,%s", kw, hit.Highlight, c[0])) 231 + } 232 + } else { 233 + panic(fmt.Sprintf("2===%#v", hit.Highlight)) 234 + } 235 + 236 + repoID, fileName := internal.ParseIndexerID(hit.Id) 237 + res := make(map[string]interface{}) 238 + if err := json.Unmarshal(hit.Source, &res); err != nil { 239 + return 0, nil, nil, err 240 + } 241 + 242 + language := res["language"].(string) 243 + 244 + hits = append(hits, &internal.SearchResult{ 245 + RepoID: repoID, 246 + Filename: fileName, 247 + CommitID: res["commit_id"].(string), 248 + Content: res["content"].(string), 249 + UpdatedUnix: timeutil.TimeStamp(res["updated_at"].(float64)), 250 + Language: language, 251 + StartIndex: startIndex, 252 + EndIndex: endIndex - 9, // remove the length <em></em> since we give Content the original data 253 + Color: enry.GetColor(language), 254 + }) 255 + } 256 + 257 + return searchResult.TotalHits(), hits, extractAggs(searchResult), nil 258 + } 259 + 260 + func extractAggs(searchResult *elastic.SearchResult) []*internal.SearchResultLanguages { 261 + var searchResultLanguages []*internal.SearchResultLanguages 262 + agg, found := searchResult.Aggregations.Terms("language") 263 + if found { 264 + searchResultLanguages = make([]*internal.SearchResultLanguages, 0, 10) 265 + 266 + for _, bucket := range agg.Buckets { 267 + searchResultLanguages = append(searchResultLanguages, &internal.SearchResultLanguages{ 268 + Language: bucket.Key.(string), 269 + Color: enry.GetColor(bucket.Key.(string)), 270 + Count: int(bucket.DocCount), 271 + }) 272 + } 273 + } 274 + return searchResultLanguages 275 + } 276 + 277 + // Search searches for codes and language stats by given conditions. 278 + func (b *Indexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*internal.SearchResult, []*internal.SearchResultLanguages, error) { 279 + searchType := esMultiMatchTypeBestFields 280 + if isMatch { 281 + searchType = esMultiMatchTypePhrasePrefix 282 + } 283 + 284 + kwQuery := elastic.NewMultiMatchQuery(keyword, "content").Type(searchType) 285 + query := elastic.NewBoolQuery() 286 + query = query.Must(kwQuery) 287 + if len(repoIDs) > 0 { 288 + repoStrs := make([]interface{}, 0, len(repoIDs)) 289 + for _, repoID := range repoIDs { 290 + repoStrs = append(repoStrs, repoID) 291 + } 292 + repoQuery := elastic.NewTermsQuery("repo_id", repoStrs...) 293 + query = query.Must(repoQuery) 294 + } 295 + 296 + var ( 297 + start int 298 + kw = "<em>" + keyword + "</em>" 299 + aggregation = elastic.NewTermsAggregation().Field("language").Size(10).OrderByCountDesc() 300 + ) 301 + 302 + if page > 0 { 303 + start = (page - 1) * pageSize 304 + } 305 + 306 + if len(language) == 0 { 307 + searchResult, err := b.inner.Client.Search(). 308 + Index(b.inner.VersionedIndexName()). 309 + Aggregation("language", aggregation). 310 + Query(query). 311 + Highlight( 312 + elastic.NewHighlight(). 313 + Field("content"). 314 + NumOfFragments(0). // return all highting content on fragments 315 + HighlighterType("fvh"), 316 + ). 317 + Sort("repo_id", true). 318 + From(start).Size(pageSize). 319 + Do(ctx) 320 + if err != nil { 321 + return 0, nil, nil, err 322 + } 323 + 324 + return convertResult(searchResult, kw, pageSize) 325 + } 326 + 327 + langQuery := elastic.NewMatchQuery("language", language) 328 + countResult, err := b.inner.Client.Search(). 329 + Index(b.inner.VersionedIndexName()). 330 + Aggregation("language", aggregation). 331 + Query(query). 332 + Size(0). // We only need stats information 333 + Do(ctx) 334 + if err != nil { 335 + return 0, nil, nil, err 336 + } 337 + 338 + query = query.Must(langQuery) 339 + searchResult, err := b.inner.Client.Search(). 340 + Index(b.inner.VersionedIndexName()). 341 + Query(query). 342 + Highlight( 343 + elastic.NewHighlight(). 344 + Field("content"). 345 + NumOfFragments(0). // return all highting content on fragments 346 + HighlighterType("fvh"), 347 + ). 348 + Sort("repo_id", true). 349 + From(start).Size(pageSize). 350 + Do(ctx) 351 + if err != nil { 352 + return 0, nil, nil, err 353 + } 354 + 355 + total, hits, _, err := convertResult(searchResult, kw, pageSize) 356 + 357 + return total, hits, extractAggs(countResult), err 358 + }
+16
modules/indexer/code/elasticsearch/elasticsearch_test.go
··· 1 + // Copyright 2020 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package elasticsearch 5 + 6 + import ( 7 + "testing" 8 + 9 + "github.com/stretchr/testify/assert" 10 + ) 11 + 12 + func TestIndexPos(t *testing.T) { 13 + startIdx, endIdx := indexPos("test index start and end", "start", "end") 14 + assert.EqualValues(t, 11, startIdx) 15 + assert.EqualValues(t, 24, endIdx) 16 + }
+10 -22
modules/indexer/code/git.go
··· 10 10 11 11 repo_model "code.gitea.io/gitea/models/repo" 12 12 "code.gitea.io/gitea/modules/git" 13 + "code.gitea.io/gitea/modules/indexer/code/internal" 13 14 "code.gitea.io/gitea/modules/log" 14 15 "code.gitea.io/gitea/modules/setting" 15 16 ) 16 17 17 - type fileUpdate struct { 18 - Filename string 19 - BlobSha string 20 - Size int64 21 - Sized bool 22 - } 23 - 24 - // repoChanges changes (file additions/updates/removals) to a repo 25 - type repoChanges struct { 26 - Updates []fileUpdate 27 - RemovedFilenames []string 28 - } 29 - 30 18 func getDefaultBranchSha(ctx context.Context, repo *repo_model.Repository) (string, error) { 31 19 stdout, _, err := git.NewCommand(ctx, "show-ref", "-s").AddDynamicArguments(git.BranchPrefix + repo.DefaultBranch).RunStdString(&git.RunOpts{Dir: repo.RepoPath()}) 32 20 if err != nil { ··· 36 24 } 37 25 38 26 // getRepoChanges returns changes to repo since last indexer update 39 - func getRepoChanges(ctx context.Context, repo *repo_model.Repository, revision string) (*repoChanges, error) { 27 + func getRepoChanges(ctx context.Context, repo *repo_model.Repository, revision string) (*internal.RepoChanges, error) { 40 28 status, err := repo_model.GetIndexerStatus(ctx, repo, repo_model.RepoIndexerTypeCode) 41 29 if err != nil { 42 30 return nil, err ··· 67 55 } 68 56 69 57 // parseGitLsTreeOutput parses the output of a `git ls-tree -r --full-name` command 70 - func parseGitLsTreeOutput(stdout []byte) ([]fileUpdate, error) { 58 + func parseGitLsTreeOutput(stdout []byte) ([]internal.FileUpdate, error) { 71 59 entries, err := git.ParseTreeEntries(stdout) 72 60 if err != nil { 73 61 return nil, err 74 62 } 75 63 idxCount := 0 76 - updates := make([]fileUpdate, len(entries)) 64 + updates := make([]internal.FileUpdate, len(entries)) 77 65 for _, entry := range entries { 78 66 if isIndexable(entry) { 79 - updates[idxCount] = fileUpdate{ 67 + updates[idxCount] = internal.FileUpdate{ 80 68 Filename: entry.Name(), 81 69 BlobSha: entry.ID.String(), 82 70 Size: entry.Size(), ··· 89 77 } 90 78 91 79 // genesisChanges get changes to add repo to the indexer for the first time 92 - func genesisChanges(ctx context.Context, repo *repo_model.Repository, revision string) (*repoChanges, error) { 93 - var changes repoChanges 80 + func genesisChanges(ctx context.Context, repo *repo_model.Repository, revision string) (*internal.RepoChanges, error) { 81 + var changes internal.RepoChanges 94 82 stdout, _, runErr := git.NewCommand(ctx, "ls-tree", "--full-tree", "-l", "-r").AddDynamicArguments(revision).RunStdBytes(&git.RunOpts{Dir: repo.RepoPath()}) 95 83 if runErr != nil { 96 84 return nil, runErr ··· 102 90 } 103 91 104 92 // nonGenesisChanges get changes since the previous indexer update 105 - func nonGenesisChanges(ctx context.Context, repo *repo_model.Repository, revision string) (*repoChanges, error) { 93 + func nonGenesisChanges(ctx context.Context, repo *repo_model.Repository, revision string) (*internal.RepoChanges, error) { 106 94 diffCmd := git.NewCommand(ctx, "diff", "--name-status").AddDynamicArguments(repo.CodeIndexerStatus.CommitSha, revision) 107 95 stdout, _, runErr := diffCmd.RunStdString(&git.RunOpts{Dir: repo.RepoPath()}) 108 96 if runErr != nil { 109 97 // previous commit sha may have been removed by a force push, so 110 98 // try rebuilding from scratch 111 99 log.Warn("git diff: %v", runErr) 112 - if err := indexer.Delete(repo.ID); err != nil { 100 + if err := (*globalIndexer.Load()).Delete(ctx, repo.ID); err != nil { 113 101 return nil, err 114 102 } 115 103 return genesisChanges(ctx, repo, revision) 116 104 } 117 105 118 - var changes repoChanges 106 + var changes internal.RepoChanges 119 107 var err error 120 108 updatedFilenames := make([]string, 0, 10) 121 109 for _, line := range strings.Split(stdout, "\n") {
+47 -98
modules/indexer/code/indexer.go
··· 7 7 "context" 8 8 "os" 9 9 "runtime/pprof" 10 - "strconv" 11 - "strings" 10 + "sync/atomic" 12 11 "time" 13 12 14 13 "code.gitea.io/gitea/models/db" 15 14 repo_model "code.gitea.io/gitea/models/repo" 16 15 "code.gitea.io/gitea/modules/graceful" 16 + "code.gitea.io/gitea/modules/indexer/code/bleve" 17 + "code.gitea.io/gitea/modules/indexer/code/elasticsearch" 18 + "code.gitea.io/gitea/modules/indexer/code/internal" 17 19 "code.gitea.io/gitea/modules/log" 18 20 "code.gitea.io/gitea/modules/process" 19 21 "code.gitea.io/gitea/modules/queue" 20 22 "code.gitea.io/gitea/modules/setting" 21 - "code.gitea.io/gitea/modules/timeutil" 22 23 "code.gitea.io/gitea/modules/util" 23 24 ) 24 25 25 - // SearchResult result of performing a search in a repo 26 - type SearchResult struct { 27 - RepoID int64 28 - StartIndex int 29 - EndIndex int 30 - Filename string 31 - Content string 32 - CommitID string 33 - UpdatedUnix timeutil.TimeStamp 34 - Language string 35 - Color string 36 - } 26 + var ( 27 + indexerQueue *queue.WorkerPoolQueue[*internal.IndexerData] 28 + // globalIndexer is the global indexer, it cannot be nil. 29 + // When the real indexer is not ready, it will be a dummy indexer which will return error to explain it's not ready. 30 + // So it's always safe use it as *globalIndexer.Load() and call its methods. 31 + globalIndexer atomic.Pointer[internal.Indexer] 32 + dummyIndexer *internal.Indexer 33 + ) 37 34 38 - // SearchResultLanguages result of top languages count in search results 39 - type SearchResultLanguages struct { 40 - Language string 41 - Color string 42 - Count int 35 + func init() { 36 + i := internal.NewDummyIndexer() 37 + dummyIndexer = &i 38 + globalIndexer.Store(dummyIndexer) 43 39 } 44 40 45 - // Indexer defines an interface to index and search code contents 46 - type Indexer interface { 47 - Ping() bool 48 - Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *repoChanges) error 49 - Delete(repoID int64) error 50 - Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*SearchResult, []*SearchResultLanguages, error) 51 - Close() 52 - } 53 - 54 - func filenameIndexerID(repoID int64, filename string) string { 55 - return indexerID(repoID) + "_" + filename 56 - } 57 - 58 - func indexerID(id int64) string { 59 - return strconv.FormatInt(id, 36) 60 - } 61 - 62 - func parseIndexerID(indexerID string) (int64, string) { 63 - index := strings.IndexByte(indexerID, '_') 64 - if index == -1 { 65 - log.Error("Unexpected ID in repo indexer: %s", indexerID) 66 - } 67 - repoID, _ := strconv.ParseInt(indexerID[:index], 36, 64) 68 - return repoID, indexerID[index+1:] 69 - } 70 - 71 - func filenameOfIndexerID(indexerID string) string { 72 - index := strings.IndexByte(indexerID, '_') 73 - if index == -1 { 74 - log.Error("Unexpected ID in repo indexer: %s", indexerID) 75 - } 76 - return indexerID[index+1:] 77 - } 78 - 79 - // IndexerData represents data stored in the code indexer 80 - type IndexerData struct { 81 - RepoID int64 82 - } 83 - 84 - var indexerQueue *queue.WorkerPoolQueue[*IndexerData] 85 - 86 - func index(ctx context.Context, indexer Indexer, repoID int64) error { 41 + func index(ctx context.Context, indexer internal.Indexer, repoID int64) error { 87 42 repo, err := repo_model.GetRepositoryByID(ctx, repoID) 88 43 if repo_model.IsErrRepoNotExist(err) { 89 - return indexer.Delete(repoID) 44 + return indexer.Delete(ctx, repoID) 90 45 } 91 46 if err != nil { 92 47 return err ··· 139 94 // Init initialize the repo indexer 140 95 func Init() { 141 96 if !setting.Indexer.RepoIndexerEnabled { 142 - indexer.Close() 97 + (*globalIndexer.Load()).Close() 143 98 return 144 99 } 145 100 ··· 153 108 } 154 109 cancel() 155 110 log.Debug("Closing repository indexer") 156 - indexer.Close() 111 + (*globalIndexer.Load()).Close() 157 112 log.Info("PID: %d Repository Indexer closed", os.Getpid()) 158 113 finished() 159 114 }) ··· 163 118 // Create the Queue 164 119 switch setting.Indexer.RepoType { 165 120 case "bleve", "elasticsearch": 166 - handler := func(items ...*IndexerData) (unhandled []*IndexerData) { 167 - idx, err := indexer.get() 168 - if idx == nil || err != nil { 169 - log.Warn("Codes indexer handler: indexer is not ready, retry later.") 170 - return items 171 - } 172 - 121 + handler := func(items ...*internal.IndexerData) (unhandled []*internal.IndexerData) { 122 + indexer := *globalIndexer.Load() 173 123 for _, indexerData := range items { 174 124 log.Trace("IndexerData Process Repo: %d", indexerData.RepoID) 175 125 ··· 188 138 code.gitea.io/gitea/modules/indexer/code.index(indexer.go:105) 189 139 */ 190 140 if err := index(ctx, indexer, indexerData.RepoID); err != nil { 191 - if !idx.Ping() { 192 - log.Error("Code indexer handler: indexer is unavailable.") 193 - unhandled = append(unhandled, indexerData) 194 - continue 195 - } 141 + unhandled = append(unhandled, indexerData) 196 142 if !setting.IsInTesting { 197 143 log.Error("Codes indexer handler: index error for repo %v: %v", indexerData.RepoID, err) 198 144 } ··· 213 159 pprof.SetGoroutineLabels(ctx) 214 160 start := time.Now() 215 161 var ( 216 - rIndexer Indexer 217 - populate bool 162 + rIndexer internal.Indexer 163 + existed bool 218 164 err error 219 165 ) 220 166 switch setting.Indexer.RepoType { ··· 228 174 } 229 175 }() 230 176 231 - rIndexer, populate, err = NewBleveIndexer(setting.Indexer.RepoPath) 177 + rIndexer = bleve.NewIndexer(setting.Indexer.RepoPath) 178 + existed, err = rIndexer.Init(ctx) 232 179 if err != nil { 233 180 cancel() 234 - indexer.Close() 181 + (*globalIndexer.Load()).Close() 235 182 close(waitChannel) 236 183 log.Fatal("PID: %d Unable to initialize the bleve Repository Indexer at path: %s Error: %v", os.Getpid(), setting.Indexer.RepoPath, err) 237 184 } ··· 245 192 } 246 193 }() 247 194 248 - rIndexer, populate, err = NewElasticSearchIndexer(setting.Indexer.RepoConnStr, setting.Indexer.RepoIndexerName) 195 + rIndexer = elasticsearch.NewIndexer(setting.Indexer.RepoConnStr, setting.Indexer.RepoIndexerName) 249 196 if err != nil { 250 197 cancel() 251 - indexer.Close() 198 + (*globalIndexer.Load()).Close() 199 + close(waitChannel) 200 + log.Fatal("PID: %d Unable to create the elasticsearch Repository Indexer connstr: %s Error: %v", os.Getpid(), setting.Indexer.RepoConnStr, err) 201 + } 202 + existed, err = rIndexer.Init(ctx) 203 + if err != nil { 204 + cancel() 205 + (*globalIndexer.Load()).Close() 252 206 close(waitChannel) 253 207 log.Fatal("PID: %d Unable to initialize the elasticsearch Repository Indexer connstr: %s Error: %v", os.Getpid(), setting.Indexer.RepoConnStr, err) 254 208 } 209 + 255 210 default: 256 211 log.Fatal("PID: %d Unknown Indexer type: %s", os.Getpid(), setting.Indexer.RepoType) 257 212 } 258 213 259 - indexer.set(rIndexer) 214 + globalIndexer.Store(&rIndexer) 260 215 261 216 // Start processing the queue 262 217 go graceful.GetManager().RunWithCancel(indexerQueue) 263 218 264 - if populate { 219 + if !existed { // populate the index because it's created for the first time 265 220 go graceful.GetManager().RunWithShutdownContext(populateRepoIndexer) 266 221 } 267 222 select { ··· 283 238 case <-graceful.GetManager().IsShutdown(): 284 239 log.Warn("Shutdown before Repository Indexer completed initialization") 285 240 cancel() 286 - indexer.Close() 241 + (*globalIndexer.Load()).Close() 287 242 case duration, ok := <-waitChannel: 288 243 if !ok { 289 244 log.Warn("Repository Indexer Initialization failed") 290 245 cancel() 291 - indexer.Close() 246 + (*globalIndexer.Load()).Close() 292 247 return 293 248 } 294 249 log.Info("Repository Indexer Initialization took %v", duration) 295 250 case <-time.After(timeout): 296 251 cancel() 297 - indexer.Close() 252 + (*globalIndexer.Load()).Close() 298 253 log.Fatal("Repository Indexer Initialization Timed-Out after: %v", timeout) 299 254 } 300 255 }() ··· 303 258 304 259 // UpdateRepoIndexer update a repository's entries in the indexer 305 260 func UpdateRepoIndexer(repo *repo_model.Repository) { 306 - indexData := &IndexerData{RepoID: repo.ID} 261 + indexData := &internal.IndexerData{RepoID: repo.ID} 307 262 if err := indexerQueue.Push(indexData); err != nil { 308 263 log.Error("Update repo index data %v failed: %v", indexData, err) 309 264 } 310 265 } 311 266 312 267 // IsAvailable checks if issue indexer is available 313 - func IsAvailable() bool { 314 - idx, err := indexer.get() 315 - if err != nil { 316 - log.Error("IsAvailable(): unable to get indexer: %v", err) 317 - return false 318 - } 319 - 320 - return idx.Ping() 268 + func IsAvailable(ctx context.Context) bool { 269 + return (*globalIndexer.Load()).Ping(ctx) == nil 321 270 } 322 271 323 272 // populateRepoIndexer populate the repo indexer with pre-existing data. This ··· 368 317 return 369 318 default: 370 319 } 371 - if err := indexerQueue.Push(&IndexerData{RepoID: id}); err != nil { 320 + if err := indexerQueue.Push(&internal.IndexerData{RepoID: id}); err != nil { 372 321 log.Error("indexerQueue.Push: %v", err) 373 322 return 374 323 }
+48 -2
modules/indexer/code/indexer_test.go
··· 5 5 6 6 import ( 7 7 "context" 8 + "os" 8 9 "path/filepath" 9 10 "testing" 10 11 11 12 "code.gitea.io/gitea/models/unittest" 12 13 "code.gitea.io/gitea/modules/git" 14 + "code.gitea.io/gitea/modules/indexer/code/bleve" 15 + "code.gitea.io/gitea/modules/indexer/code/elasticsearch" 16 + "code.gitea.io/gitea/modules/indexer/code/internal" 13 17 14 18 _ "code.gitea.io/gitea/models" 15 19 ··· 22 26 }) 23 27 } 24 28 25 - func testIndexer(name string, t *testing.T, indexer Indexer) { 29 + func testIndexer(name string, t *testing.T, indexer internal.Indexer) { 26 30 t.Run(name, func(t *testing.T) { 27 31 var repoID int64 = 1 28 32 err := index(git.DefaultContext, indexer, repoID) ··· 81 85 }) 82 86 } 83 87 84 - assert.NoError(t, indexer.Delete(repoID)) 88 + assert.NoError(t, indexer.Delete(context.Background(), repoID)) 85 89 }) 86 90 } 91 + 92 + func TestBleveIndexAndSearch(t *testing.T) { 93 + unittest.PrepareTestEnv(t) 94 + 95 + dir := t.TempDir() 96 + 97 + idx := bleve.NewIndexer(dir) 98 + _, err := idx.Init(context.Background()) 99 + if err != nil { 100 + assert.Fail(t, "Unable to create bleve indexer Error: %v", err) 101 + if idx != nil { 102 + idx.Close() 103 + } 104 + return 105 + } 106 + defer idx.Close() 107 + 108 + testIndexer("beleve", t, idx) 109 + } 110 + 111 + func TestESIndexAndSearch(t *testing.T) { 112 + unittest.PrepareTestEnv(t) 113 + 114 + u := os.Getenv("TEST_INDEXER_CODE_ES_URL") 115 + if u == "" { 116 + t.SkipNow() 117 + return 118 + } 119 + 120 + indexer := elasticsearch.NewIndexer(u, "gitea_codes") 121 + if _, err := indexer.Init(context.Background()); err != nil { 122 + assert.Fail(t, "Unable to init ES indexer Error: %v", err) 123 + if indexer != nil { 124 + indexer.Close() 125 + } 126 + return 127 + } 128 + 129 + defer indexer.Close() 130 + 131 + testIndexer("elastic_search", t, indexer) 132 + }
+43
modules/indexer/code/internal/indexer.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package internal 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + 10 + repo_model "code.gitea.io/gitea/models/repo" 11 + "code.gitea.io/gitea/modules/indexer/internal" 12 + ) 13 + 14 + // Indexer defines an interface to index and search code contents 15 + type Indexer interface { 16 + internal.Indexer 17 + Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *RepoChanges) error 18 + Delete(ctx context.Context, repoID int64) error 19 + Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*SearchResult, []*SearchResultLanguages, error) 20 + } 21 + 22 + // NewDummyIndexer returns a dummy indexer 23 + func NewDummyIndexer() Indexer { 24 + return &dummyIndexer{ 25 + Indexer: internal.NewDummyIndexer(), 26 + } 27 + } 28 + 29 + type dummyIndexer struct { 30 + internal.Indexer 31 + } 32 + 33 + func (d *dummyIndexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *RepoChanges) error { 34 + return fmt.Errorf("indexer is not ready") 35 + } 36 + 37 + func (d *dummyIndexer) Delete(ctx context.Context, repoID int64) error { 38 + return fmt.Errorf("indexer is not ready") 39 + } 40 + 41 + func (d *dummyIndexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*SearchResult, []*SearchResultLanguages, error) { 42 + return 0, nil, nil, fmt.Errorf("indexer is not ready") 43 + }
+44
modules/indexer/code/internal/model.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package internal 5 + 6 + import "code.gitea.io/gitea/modules/timeutil" 7 + 8 + type FileUpdate struct { 9 + Filename string 10 + BlobSha string 11 + Size int64 12 + Sized bool 13 + } 14 + 15 + // RepoChanges changes (file additions/updates/removals) to a repo 16 + type RepoChanges struct { 17 + Updates []FileUpdate 18 + RemovedFilenames []string 19 + } 20 + 21 + // IndexerData represents data stored in the code indexer 22 + type IndexerData struct { 23 + RepoID int64 24 + } 25 + 26 + // SearchResult result of performing a search in a repo 27 + type SearchResult struct { 28 + RepoID int64 29 + StartIndex int 30 + EndIndex int 31 + Filename string 32 + Content string 33 + CommitID string 34 + UpdatedUnix timeutil.TimeStamp 35 + Language string 36 + Color string 37 + } 38 + 39 + // SearchResultLanguages result of top languages count in search results 40 + type SearchResultLanguages struct { 41 + Language string 42 + Color string 43 + Count int 44 + }
+32
modules/indexer/code/internal/util.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package internal 5 + 6 + import ( 7 + "strings" 8 + 9 + "code.gitea.io/gitea/modules/indexer/internal" 10 + "code.gitea.io/gitea/modules/log" 11 + ) 12 + 13 + func FilenameIndexerID(repoID int64, filename string) string { 14 + return internal.Base36(repoID) + "_" + filename 15 + } 16 + 17 + func ParseIndexerID(indexerID string) (int64, string) { 18 + index := strings.IndexByte(indexerID, '_') 19 + if index == -1 { 20 + log.Error("Unexpected ID in repo indexer: %s", indexerID) 21 + } 22 + repoID, _ := internal.ParseBase36(indexerID[:index]) 23 + return repoID, indexerID[index+1:] 24 + } 25 + 26 + func FilenameOfIndexerID(indexerID string) string { 27 + index := strings.IndexByte(indexerID, '_') 28 + if index == -1 { 29 + log.Error("Unexpected ID in repo indexer: %s", indexerID) 30 + } 31 + return indexerID[index+1:] 32 + }
+6 -3
modules/indexer/code/search.go
··· 9 9 "strings" 10 10 11 11 "code.gitea.io/gitea/modules/highlight" 12 + "code.gitea.io/gitea/modules/indexer/code/internal" 12 13 "code.gitea.io/gitea/modules/timeutil" 13 14 "code.gitea.io/gitea/modules/util" 14 15 ) ··· 24 25 LineNumbers []int 25 26 FormattedLines string 26 27 } 28 + 29 + type SearchResultLanguages = internal.SearchResultLanguages 27 30 28 31 func indices(content string, selectionStartIndex, selectionEndIndex int) (int, int) { 29 32 startIndex := selectionStartIndex ··· 61 64 return nil 62 65 } 63 66 64 - func searchResult(result *SearchResult, startIndex, endIndex int) (*Result, error) { 67 + func searchResult(result *internal.SearchResult, startIndex, endIndex int) (*Result, error) { 65 68 startLineNum := 1 + strings.Count(result.Content[:startIndex], "\n") 66 69 67 70 var formattedLinesBuffer bytes.Buffer ··· 109 112 } 110 113 111 114 // PerformSearch perform a search on a repository 112 - func PerformSearch(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int, []*Result, []*SearchResultLanguages, error) { 115 + func PerformSearch(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int, []*Result, []*internal.SearchResultLanguages, error) { 113 116 if len(keyword) == 0 { 114 117 return 0, nil, nil, nil 115 118 } 116 119 117 - total, results, resultLanguages, err := indexer.Search(ctx, repoIDs, language, keyword, page, pageSize, isMatch) 120 + total, results, resultLanguages, err := (*globalIndexer.Load()).Search(ctx, repoIDs, language, keyword, page, pageSize, isMatch) 118 121 if err != nil { 119 122 return 0, nil, nil, err 120 123 }
-104
modules/indexer/code/wrapped.go
··· 1 - // Copyright 2019 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package code 5 - 6 - import ( 7 - "context" 8 - "fmt" 9 - "sync" 10 - 11 - repo_model "code.gitea.io/gitea/models/repo" 12 - "code.gitea.io/gitea/modules/log" 13 - ) 14 - 15 - var indexer = newWrappedIndexer() 16 - 17 - // ErrWrappedIndexerClosed is the error returned if the indexer was closed before it was ready 18 - var ErrWrappedIndexerClosed = fmt.Errorf("Indexer closed before ready") 19 - 20 - type wrappedIndexer struct { 21 - internal Indexer 22 - lock sync.RWMutex 23 - cond *sync.Cond 24 - closed bool 25 - } 26 - 27 - func newWrappedIndexer() *wrappedIndexer { 28 - w := &wrappedIndexer{} 29 - w.cond = sync.NewCond(w.lock.RLocker()) 30 - return w 31 - } 32 - 33 - func (w *wrappedIndexer) set(indexer Indexer) { 34 - w.lock.Lock() 35 - defer w.lock.Unlock() 36 - if w.closed { 37 - // Too late! 38 - indexer.Close() 39 - } 40 - w.internal = indexer 41 - w.cond.Broadcast() 42 - } 43 - 44 - func (w *wrappedIndexer) get() (Indexer, error) { 45 - w.lock.RLock() 46 - defer w.lock.RUnlock() 47 - if w.internal == nil { 48 - if w.closed { 49 - return nil, ErrWrappedIndexerClosed 50 - } 51 - w.cond.Wait() 52 - if w.closed { 53 - return nil, ErrWrappedIndexerClosed 54 - } 55 - } 56 - return w.internal, nil 57 - } 58 - 59 - // Ping checks if elastic is available 60 - func (w *wrappedIndexer) Ping() bool { 61 - indexer, err := w.get() 62 - if err != nil { 63 - log.Warn("Failed to get indexer: %v", err) 64 - return false 65 - } 66 - return indexer.Ping() 67 - } 68 - 69 - func (w *wrappedIndexer) Index(ctx context.Context, repo *repo_model.Repository, sha string, changes *repoChanges) error { 70 - indexer, err := w.get() 71 - if err != nil { 72 - return err 73 - } 74 - return indexer.Index(ctx, repo, sha, changes) 75 - } 76 - 77 - func (w *wrappedIndexer) Delete(repoID int64) error { 78 - indexer, err := w.get() 79 - if err != nil { 80 - return err 81 - } 82 - return indexer.Delete(repoID) 83 - } 84 - 85 - func (w *wrappedIndexer) Search(ctx context.Context, repoIDs []int64, language, keyword string, page, pageSize int, isMatch bool) (int64, []*SearchResult, []*SearchResultLanguages, error) { 86 - indexer, err := w.get() 87 - if err != nil { 88 - return 0, nil, nil, err 89 - } 90 - return indexer.Search(ctx, repoIDs, language, keyword, page, pageSize, isMatch) 91 - } 92 - 93 - func (w *wrappedIndexer) Close() { 94 - w.lock.Lock() 95 - defer w.lock.Unlock() 96 - if w.closed { 97 - return 98 - } 99 - w.closed = true 100 - w.cond.Broadcast() 101 - if w.internal != nil { 102 - w.internal.Close() 103 - } 104 - }
+21
modules/indexer/internal/base32.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package internal 5 + 6 + import ( 7 + "fmt" 8 + "strconv" 9 + ) 10 + 11 + func Base36(i int64) string { 12 + return strconv.FormatInt(i, 36) 13 + } 14 + 15 + func ParseBase36(s string) (int64, error) { 16 + i, err := strconv.ParseInt(s, 36, 64) 17 + if err != nil { 18 + return 0, fmt.Errorf("invalid base36 integer %q: %w", s, err) 19 + } 20 + return i, nil 21 + }
+103
modules/indexer/internal/bleve/indexer.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package bleve 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + 10 + "code.gitea.io/gitea/modules/indexer/internal" 11 + "code.gitea.io/gitea/modules/log" 12 + 13 + "github.com/blevesearch/bleve/v2" 14 + "github.com/blevesearch/bleve/v2/mapping" 15 + "github.com/ethantkoenig/rupture" 16 + ) 17 + 18 + var _ internal.Indexer = &Indexer{} 19 + 20 + // Indexer represents a basic bleve indexer implementation 21 + type Indexer struct { 22 + Indexer bleve.Index 23 + 24 + indexDir string 25 + version int 26 + mappingGetter MappingGetter 27 + } 28 + 29 + type MappingGetter func() (mapping.IndexMapping, error) 30 + 31 + func NewIndexer(indexDir string, version int, mappingGetter func() (mapping.IndexMapping, error)) *Indexer { 32 + return &Indexer{ 33 + indexDir: indexDir, 34 + version: version, 35 + mappingGetter: mappingGetter, 36 + } 37 + } 38 + 39 + // Init initializes the indexer 40 + func (i *Indexer) Init(_ context.Context) (bool, error) { 41 + if i == nil { 42 + return false, fmt.Errorf("cannot init nil indexer") 43 + } 44 + 45 + if i.Indexer != nil { 46 + return false, fmt.Errorf("indexer is already initialized") 47 + } 48 + 49 + indexer, version, err := openIndexer(i.indexDir, i.version) 50 + if err != nil { 51 + return false, err 52 + } 53 + if indexer != nil { 54 + i.Indexer = indexer 55 + return true, nil 56 + } 57 + 58 + if version != 0 { 59 + log.Warn("Found older bleve index with version %d, Gitea will remove it and rebuild", version) 60 + } 61 + 62 + indexMapping, err := i.mappingGetter() 63 + if err != nil { 64 + return false, err 65 + } 66 + 67 + indexer, err = bleve.New(i.indexDir, indexMapping) 68 + if err != nil { 69 + return false, err 70 + } 71 + 72 + if err = rupture.WriteIndexMetadata(i.indexDir, &rupture.IndexMetadata{ 73 + Version: i.version, 74 + }); err != nil { 75 + return false, err 76 + } 77 + 78 + i.Indexer = indexer 79 + 80 + return false, nil 81 + } 82 + 83 + // Ping checks if the indexer is available 84 + func (i *Indexer) Ping(_ context.Context) error { 85 + if i == nil { 86 + return fmt.Errorf("cannot ping nil indexer") 87 + } 88 + if i.Indexer == nil { 89 + return fmt.Errorf("indexer is not initialized") 90 + } 91 + return nil 92 + } 93 + 94 + func (i *Indexer) Close() { 95 + if i == nil { 96 + return 97 + } 98 + 99 + if err := i.Indexer.Close(); err != nil { 100 + log.Error("Failed to close bleve indexer in %q: %v", i.indexDir, err) 101 + } 102 + i.Indexer = nil 103 + }
+49
modules/indexer/internal/bleve/util.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package bleve 5 + 6 + import ( 7 + "errors" 8 + "os" 9 + 10 + "code.gitea.io/gitea/modules/log" 11 + "code.gitea.io/gitea/modules/util" 12 + 13 + "github.com/blevesearch/bleve/v2" 14 + "github.com/blevesearch/bleve/v2/index/upsidedown" 15 + "github.com/ethantkoenig/rupture" 16 + ) 17 + 18 + // openIndexer open the index at the specified path, checking for metadata 19 + // updates and bleve version updates. If index needs to be created (or 20 + // re-created), returns (nil, nil) 21 + func openIndexer(path string, latestVersion int) (bleve.Index, int, error) { 22 + _, err := os.Stat(path) 23 + if err != nil && os.IsNotExist(err) { 24 + return nil, 0, nil 25 + } else if err != nil { 26 + return nil, 0, err 27 + } 28 + 29 + metadata, err := rupture.ReadIndexMetadata(path) 30 + if err != nil { 31 + return nil, 0, err 32 + } 33 + if metadata.Version < latestVersion { 34 + // the indexer is using a previous version, so we should delete it and 35 + // re-populate 36 + return nil, metadata.Version, util.RemoveAll(path) 37 + } 38 + 39 + index, err := bleve.Open(path) 40 + if err != nil { 41 + if errors.Is(err, upsidedown.IncompatibleVersion) { 42 + log.Warn("Indexer was built with a previous version of bleve, deleting and rebuilding") 43 + return nil, 0, util.RemoveAll(path) 44 + } 45 + return nil, 0, err 46 + } 47 + 48 + return index, 0, nil 49 + }
+33
modules/indexer/internal/db/indexer.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package db 5 + 6 + import ( 7 + "context" 8 + 9 + "code.gitea.io/gitea/modules/indexer/internal" 10 + ) 11 + 12 + var _ internal.Indexer = &Indexer{} 13 + 14 + // Indexer represents a basic db indexer implementation 15 + type Indexer struct{} 16 + 17 + // Init initializes the indexer 18 + func (i *Indexer) Init(_ context.Context) (bool, error) { 19 + // nothing to do 20 + return false, nil 21 + } 22 + 23 + // Ping checks if the indexer is available 24 + func (i *Indexer) Ping(_ context.Context) error { 25 + // No need to ping database to check if it is available. 26 + // If the database goes down, Gitea will go down, so nobody will care if the indexer is available. 27 + return nil 28 + } 29 + 30 + // Close closes the indexer 31 + func (i *Indexer) Close() { 32 + // nothing to do 33 + }
+92
modules/indexer/internal/elasticsearch/indexer.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package elasticsearch 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + 10 + "code.gitea.io/gitea/modules/indexer/internal" 11 + 12 + "github.com/olivere/elastic/v7" 13 + ) 14 + 15 + var _ internal.Indexer = &Indexer{} 16 + 17 + // Indexer represents a basic elasticsearch indexer implementation 18 + type Indexer struct { 19 + Client *elastic.Client 20 + 21 + url string 22 + indexName string 23 + version int 24 + mapping string 25 + } 26 + 27 + func NewIndexer(url, indexName string, version int, mapping string) *Indexer { 28 + return &Indexer{ 29 + url: url, 30 + indexName: indexName, 31 + version: version, 32 + mapping: mapping, 33 + } 34 + } 35 + 36 + // Init initializes the indexer 37 + func (i *Indexer) Init(ctx context.Context) (bool, error) { 38 + if i == nil { 39 + return false, fmt.Errorf("cannot init nil indexer") 40 + } 41 + if i.Client != nil { 42 + return false, fmt.Errorf("indexer is already initialized") 43 + } 44 + 45 + client, err := i.initClient() 46 + if err != nil { 47 + return false, err 48 + } 49 + i.Client = client 50 + 51 + exists, err := i.Client.IndexExists(i.VersionedIndexName()).Do(ctx) 52 + if err != nil { 53 + return false, err 54 + } 55 + if exists { 56 + return true, nil 57 + } 58 + 59 + if err := i.createIndex(ctx); err != nil { 60 + return false, err 61 + } 62 + 63 + return exists, nil 64 + } 65 + 66 + // Ping checks if the indexer is available 67 + func (i *Indexer) Ping(ctx context.Context) error { 68 + if i == nil { 69 + return fmt.Errorf("cannot ping nil indexer") 70 + } 71 + if i.Client == nil { 72 + return fmt.Errorf("indexer is not initialized") 73 + } 74 + 75 + resp, err := i.Client.ClusterHealth().Do(ctx) 76 + if err != nil { 77 + return err 78 + } 79 + if resp.Status != "green" { 80 + // see https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html 81 + return fmt.Errorf("status of elasticsearch cluster is %s", resp.Status) 82 + } 83 + return nil 84 + } 85 + 86 + // Close closes the indexer 87 + func (i *Indexer) Close() { 88 + if i == nil { 89 + return 90 + } 91 + i.Client = nil 92 + }
+68
modules/indexer/internal/elasticsearch/util.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package elasticsearch 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + "time" 10 + 11 + "code.gitea.io/gitea/modules/log" 12 + 13 + "github.com/olivere/elastic/v7" 14 + ) 15 + 16 + // VersionedIndexName returns the full index name with version 17 + func (i *Indexer) VersionedIndexName() string { 18 + return versionedIndexName(i.indexName, i.version) 19 + } 20 + 21 + func versionedIndexName(indexName string, version int) string { 22 + if version == 0 { 23 + // Old index name without version 24 + return indexName 25 + } 26 + return fmt.Sprintf("%s.v%d", indexName, version) 27 + } 28 + 29 + func (i *Indexer) createIndex(ctx context.Context) error { 30 + createIndex, err := i.Client.CreateIndex(i.VersionedIndexName()).BodyString(i.mapping).Do(ctx) 31 + if err != nil { 32 + return err 33 + } 34 + if !createIndex.Acknowledged { 35 + return fmt.Errorf("create index %s with %s failed", i.VersionedIndexName(), i.mapping) 36 + } 37 + 38 + i.checkOldIndexes(ctx) 39 + 40 + return nil 41 + } 42 + 43 + func (i *Indexer) initClient() (*elastic.Client, error) { 44 + opts := []elastic.ClientOptionFunc{ 45 + elastic.SetURL(i.url), 46 + elastic.SetSniff(false), 47 + elastic.SetHealthcheckInterval(10 * time.Second), 48 + elastic.SetGzip(false), 49 + } 50 + 51 + logger := log.GetLogger(log.DEFAULT) 52 + 53 + opts = append(opts, elastic.SetTraceLog(&log.PrintfLogger{Logf: logger.Trace})) 54 + opts = append(opts, elastic.SetInfoLog(&log.PrintfLogger{Logf: logger.Info})) 55 + opts = append(opts, elastic.SetErrorLog(&log.PrintfLogger{Logf: logger.Error})) 56 + 57 + return elastic.NewClient(opts...) 58 + } 59 + 60 + func (i *Indexer) checkOldIndexes(ctx context.Context) { 61 + for v := 0; v < i.version; v++ { 62 + indexName := versionedIndexName(i.indexName, v) 63 + exists, err := i.Client.IndexExists(indexName).Do(ctx) 64 + if err == nil && exists { 65 + log.Warn("Found older elasticsearch index named %q, Gitea will keep the old NOT DELETED. You can delete the old version after the upgrade succeed.", indexName) 66 + } 67 + } 68 + }
+37
modules/indexer/internal/indexer.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package internal 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + ) 10 + 11 + // Indexer defines an basic indexer interface 12 + type Indexer interface { 13 + // Init initializes the indexer 14 + // returns true if the index was opened/existed (with data populated), false if it was created/not-existed (with no data) 15 + Init(ctx context.Context) (bool, error) 16 + // Ping checks if the indexer is available 17 + Ping(ctx context.Context) error 18 + // Close closes the indexer 19 + Close() 20 + } 21 + 22 + // NewDummyIndexer returns a dummy indexer 23 + func NewDummyIndexer() Indexer { 24 + return &dummyIndexer{} 25 + } 26 + 27 + type dummyIndexer struct{} 28 + 29 + func (d *dummyIndexer) Init(ctx context.Context) (bool, error) { 30 + return false, fmt.Errorf("indexer is not ready") 31 + } 32 + 33 + func (d *dummyIndexer) Ping(ctx context.Context) error { 34 + return fmt.Errorf("indexer is not ready") 35 + } 36 + 37 + func (d *dummyIndexer) Close() {}
+92
modules/indexer/internal/meilisearch/indexer.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package meilisearch 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + 10 + "github.com/meilisearch/meilisearch-go" 11 + ) 12 + 13 + // Indexer represents a basic meilisearch indexer implementation 14 + type Indexer struct { 15 + Client *meilisearch.Client 16 + 17 + url, apiKey string 18 + indexName string 19 + version int 20 + } 21 + 22 + func NewIndexer(url, apiKey, indexName string, version int) *Indexer { 23 + return &Indexer{ 24 + url: url, 25 + apiKey: apiKey, 26 + indexName: indexName, 27 + version: version, 28 + } 29 + } 30 + 31 + // Init initializes the indexer 32 + func (i *Indexer) Init(_ context.Context) (bool, error) { 33 + if i == nil { 34 + return false, fmt.Errorf("cannot init nil indexer") 35 + } 36 + 37 + if i.Client != nil { 38 + return false, fmt.Errorf("indexer is already initialized") 39 + } 40 + 41 + i.Client = meilisearch.NewClient(meilisearch.ClientConfig{ 42 + Host: i.url, 43 + APIKey: i.apiKey, 44 + }) 45 + 46 + _, err := i.Client.GetIndex(i.VersionedIndexName()) 47 + if err == nil { 48 + return true, nil 49 + } 50 + _, err = i.Client.CreateIndex(&meilisearch.IndexConfig{ 51 + Uid: i.VersionedIndexName(), 52 + PrimaryKey: "id", 53 + }) 54 + if err != nil { 55 + return false, err 56 + } 57 + 58 + i.checkOldIndexes() 59 + 60 + _, err = i.Client.Index(i.VersionedIndexName()).UpdateFilterableAttributes(&[]string{"repo_id"}) 61 + return false, err 62 + } 63 + 64 + // Ping checks if the indexer is available 65 + func (i *Indexer) Ping(ctx context.Context) error { 66 + if i == nil { 67 + return fmt.Errorf("cannot ping nil indexer") 68 + } 69 + if i.Client == nil { 70 + return fmt.Errorf("indexer is not initialized") 71 + } 72 + resp, err := i.Client.Health() 73 + if err != nil { 74 + return err 75 + } 76 + if resp.Status != "available" { 77 + // See https://docs.meilisearch.com/reference/api/health.html#status 78 + return fmt.Errorf("status of meilisearch is not available: %s", resp.Status) 79 + } 80 + return nil 81 + } 82 + 83 + // Close closes the indexer 84 + func (i *Indexer) Close() { 85 + if i == nil { 86 + return 87 + } 88 + if i.Client == nil { 89 + return 90 + } 91 + i.Client = nil 92 + }
+38
modules/indexer/internal/meilisearch/util.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package meilisearch 5 + 6 + import ( 7 + "fmt" 8 + 9 + "code.gitea.io/gitea/modules/log" 10 + ) 11 + 12 + // VersionedIndexName returns the full index name with version 13 + func (i *Indexer) VersionedIndexName() string { 14 + return versionedIndexName(i.indexName, i.version) 15 + } 16 + 17 + func versionedIndexName(indexName string, version int) string { 18 + if version == 0 { 19 + // Old index name without version 20 + return indexName 21 + } 22 + 23 + // The format of the index name is <index_name>_v<version>, not <index_name>.v<version> like elasticsearch. 24 + // Because meilisearch does not support "." in index name, it should contain only alphanumeric characters, hyphens (-) and underscores (_). 25 + // See https://www.meilisearch.com/docs/learn/core_concepts/indexes#index-uid 26 + 27 + return fmt.Sprintf("%s_v%d", indexName, version) 28 + } 29 + 30 + func (i *Indexer) checkOldIndexes() { 31 + for v := 0; v < i.version; v++ { 32 + indexName := versionedIndexName(i.indexName, v) 33 + _, err := i.Client.GetIndex(indexName) 34 + if err == nil { 35 + log.Warn("Found older meilisearch index named %q, Gitea will keep the old NOT DELETED. You can delete the old version after the upgrade succeed.", indexName) 36 + } 37 + } 38 + }
-276
modules/indexer/issues/bleve.go
··· 1 - // Copyright 2018 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package issues 5 - 6 - import ( 7 - "context" 8 - "fmt" 9 - "os" 10 - "strconv" 11 - 12 - gitea_bleve "code.gitea.io/gitea/modules/indexer/bleve" 13 - "code.gitea.io/gitea/modules/log" 14 - "code.gitea.io/gitea/modules/util" 15 - 16 - "github.com/blevesearch/bleve/v2" 17 - "github.com/blevesearch/bleve/v2/analysis/analyzer/custom" 18 - "github.com/blevesearch/bleve/v2/analysis/token/camelcase" 19 - "github.com/blevesearch/bleve/v2/analysis/token/lowercase" 20 - "github.com/blevesearch/bleve/v2/analysis/token/unicodenorm" 21 - "github.com/blevesearch/bleve/v2/analysis/tokenizer/unicode" 22 - "github.com/blevesearch/bleve/v2/index/upsidedown" 23 - "github.com/blevesearch/bleve/v2/mapping" 24 - "github.com/blevesearch/bleve/v2/search/query" 25 - "github.com/ethantkoenig/rupture" 26 - ) 27 - 28 - const ( 29 - issueIndexerAnalyzer = "issueIndexer" 30 - issueIndexerDocType = "issueIndexerDocType" 31 - issueIndexerLatestVersion = 2 32 - ) 33 - 34 - // indexerID a bleve-compatible unique identifier for an integer id 35 - func indexerID(id int64) string { 36 - return strconv.FormatInt(id, 36) 37 - } 38 - 39 - // idOfIndexerID the integer id associated with an indexer id 40 - func idOfIndexerID(indexerID string) (int64, error) { 41 - id, err := strconv.ParseInt(indexerID, 36, 64) 42 - if err != nil { 43 - return 0, fmt.Errorf("Unexpected indexer ID %s: %w", indexerID, err) 44 - } 45 - return id, nil 46 - } 47 - 48 - // numericEqualityQuery a numeric equality query for the given value and field 49 - func numericEqualityQuery(value int64, field string) *query.NumericRangeQuery { 50 - f := float64(value) 51 - tru := true 52 - q := bleve.NewNumericRangeInclusiveQuery(&f, &f, &tru, &tru) 53 - q.SetField(field) 54 - return q 55 - } 56 - 57 - func newMatchPhraseQuery(matchPhrase, field, analyzer string) *query.MatchPhraseQuery { 58 - q := bleve.NewMatchPhraseQuery(matchPhrase) 59 - q.FieldVal = field 60 - q.Analyzer = analyzer 61 - return q 62 - } 63 - 64 - const unicodeNormalizeName = "unicodeNormalize" 65 - 66 - func addUnicodeNormalizeTokenFilter(m *mapping.IndexMappingImpl) error { 67 - return m.AddCustomTokenFilter(unicodeNormalizeName, map[string]interface{}{ 68 - "type": unicodenorm.Name, 69 - "form": unicodenorm.NFC, 70 - }) 71 - } 72 - 73 - const maxBatchSize = 16 74 - 75 - // openIndexer open the index at the specified path, checking for metadata 76 - // updates and bleve version updates. If index needs to be created (or 77 - // re-created), returns (nil, nil) 78 - func openIndexer(path string, latestVersion int) (bleve.Index, error) { 79 - _, err := os.Stat(path) 80 - if err != nil && os.IsNotExist(err) { 81 - return nil, nil 82 - } else if err != nil { 83 - return nil, err 84 - } 85 - 86 - metadata, err := rupture.ReadIndexMetadata(path) 87 - if err != nil { 88 - return nil, err 89 - } 90 - if metadata.Version < latestVersion { 91 - // the indexer is using a previous version, so we should delete it and 92 - // re-populate 93 - return nil, util.RemoveAll(path) 94 - } 95 - 96 - index, err := bleve.Open(path) 97 - if err != nil && err == upsidedown.IncompatibleVersion { 98 - // the indexer was built with a previous version of bleve, so we should 99 - // delete it and re-populate 100 - return nil, util.RemoveAll(path) 101 - } else if err != nil { 102 - return nil, err 103 - } 104 - 105 - return index, nil 106 - } 107 - 108 - // BleveIndexerData an update to the issue indexer 109 - type BleveIndexerData IndexerData 110 - 111 - // Type returns the document type, for bleve's mapping.Classifier interface. 112 - func (i *BleveIndexerData) Type() string { 113 - return issueIndexerDocType 114 - } 115 - 116 - // createIssueIndexer create an issue indexer if one does not already exist 117 - func createIssueIndexer(path string, latestVersion int) (bleve.Index, error) { 118 - mapping := bleve.NewIndexMapping() 119 - docMapping := bleve.NewDocumentMapping() 120 - 121 - numericFieldMapping := bleve.NewNumericFieldMapping() 122 - numericFieldMapping.IncludeInAll = false 123 - docMapping.AddFieldMappingsAt("RepoID", numericFieldMapping) 124 - 125 - textFieldMapping := bleve.NewTextFieldMapping() 126 - textFieldMapping.Store = false 127 - textFieldMapping.IncludeInAll = false 128 - docMapping.AddFieldMappingsAt("Title", textFieldMapping) 129 - docMapping.AddFieldMappingsAt("Content", textFieldMapping) 130 - docMapping.AddFieldMappingsAt("Comments", textFieldMapping) 131 - 132 - if err := addUnicodeNormalizeTokenFilter(mapping); err != nil { 133 - return nil, err 134 - } else if err = mapping.AddCustomAnalyzer(issueIndexerAnalyzer, map[string]interface{}{ 135 - "type": custom.Name, 136 - "char_filters": []string{}, 137 - "tokenizer": unicode.Name, 138 - "token_filters": []string{unicodeNormalizeName, camelcase.Name, lowercase.Name}, 139 - }); err != nil { 140 - return nil, err 141 - } 142 - 143 - mapping.DefaultAnalyzer = issueIndexerAnalyzer 144 - mapping.AddDocumentMapping(issueIndexerDocType, docMapping) 145 - mapping.AddDocumentMapping("_all", bleve.NewDocumentDisabledMapping()) 146 - 147 - index, err := bleve.New(path, mapping) 148 - if err != nil { 149 - return nil, err 150 - } 151 - 152 - if err = rupture.WriteIndexMetadata(path, &rupture.IndexMetadata{ 153 - Version: latestVersion, 154 - }); err != nil { 155 - return nil, err 156 - } 157 - return index, nil 158 - } 159 - 160 - var _ Indexer = &BleveIndexer{} 161 - 162 - // BleveIndexer implements Indexer interface 163 - type BleveIndexer struct { 164 - indexDir string 165 - indexer bleve.Index 166 - } 167 - 168 - // NewBleveIndexer creates a new bleve local indexer 169 - func NewBleveIndexer(indexDir string) *BleveIndexer { 170 - return &BleveIndexer{ 171 - indexDir: indexDir, 172 - } 173 - } 174 - 175 - // Init will initialize the indexer 176 - func (b *BleveIndexer) Init() (bool, error) { 177 - var err error 178 - b.indexer, err = openIndexer(b.indexDir, issueIndexerLatestVersion) 179 - if err != nil { 180 - return false, err 181 - } 182 - if b.indexer != nil { 183 - return true, nil 184 - } 185 - 186 - b.indexer, err = createIssueIndexer(b.indexDir, issueIndexerLatestVersion) 187 - return false, err 188 - } 189 - 190 - // Ping does nothing 191 - func (b *BleveIndexer) Ping() bool { 192 - return true 193 - } 194 - 195 - // Close will close the bleve indexer 196 - func (b *BleveIndexer) Close() { 197 - if b.indexer != nil { 198 - if err := b.indexer.Close(); err != nil { 199 - log.Error("Error whilst closing indexer: %v", err) 200 - } 201 - } 202 - } 203 - 204 - // Index will save the index data 205 - func (b *BleveIndexer) Index(issues []*IndexerData) error { 206 - batch := gitea_bleve.NewFlushingBatch(b.indexer, maxBatchSize) 207 - for _, issue := range issues { 208 - if err := batch.Index(indexerID(issue.ID), struct { 209 - RepoID int64 210 - Title string 211 - Content string 212 - Comments []string 213 - }{ 214 - RepoID: issue.RepoID, 215 - Title: issue.Title, 216 - Content: issue.Content, 217 - Comments: issue.Comments, 218 - }); err != nil { 219 - return err 220 - } 221 - } 222 - return batch.Flush() 223 - } 224 - 225 - // Delete deletes indexes by ids 226 - func (b *BleveIndexer) Delete(ids ...int64) error { 227 - batch := gitea_bleve.NewFlushingBatch(b.indexer, maxBatchSize) 228 - for _, id := range ids { 229 - if err := batch.Delete(indexerID(id)); err != nil { 230 - return err 231 - } 232 - } 233 - return batch.Flush() 234 - } 235 - 236 - // Search searches for issues by given conditions. 237 - // Returns the matching issue IDs 238 - func (b *BleveIndexer) Search(ctx context.Context, keyword string, repoIDs []int64, limit, start int) (*SearchResult, error) { 239 - var repoQueriesP []*query.NumericRangeQuery 240 - for _, repoID := range repoIDs { 241 - repoQueriesP = append(repoQueriesP, numericEqualityQuery(repoID, "RepoID")) 242 - } 243 - repoQueries := make([]query.Query, len(repoQueriesP)) 244 - for i, v := range repoQueriesP { 245 - repoQueries[i] = query.Query(v) 246 - } 247 - 248 - indexerQuery := bleve.NewConjunctionQuery( 249 - bleve.NewDisjunctionQuery(repoQueries...), 250 - bleve.NewDisjunctionQuery( 251 - newMatchPhraseQuery(keyword, "Title", issueIndexerAnalyzer), 252 - newMatchPhraseQuery(keyword, "Content", issueIndexerAnalyzer), 253 - newMatchPhraseQuery(keyword, "Comments", issueIndexerAnalyzer), 254 - )) 255 - search := bleve.NewSearchRequestOptions(indexerQuery, limit, start, false) 256 - search.SortBy([]string{"-_score"}) 257 - 258 - result, err := b.indexer.SearchInContext(ctx, search) 259 - if err != nil { 260 - return nil, err 261 - } 262 - 263 - ret := SearchResult{ 264 - Hits: make([]Match, 0, len(result.Hits)), 265 - } 266 - for _, hit := range result.Hits { 267 - id, err := idOfIndexerID(hit.ID) 268 - if err != nil { 269 - return nil, err 270 - } 271 - ret.Hits = append(ret.Hits, Match{ 272 - ID: id, 273 - }) 274 - } 275 - return &ret, nil 276 - }
+187
modules/indexer/issues/bleve/bleve.go
··· 1 + // Copyright 2018 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package bleve 5 + 6 + import ( 7 + "context" 8 + 9 + indexer_internal "code.gitea.io/gitea/modules/indexer/internal" 10 + inner_bleve "code.gitea.io/gitea/modules/indexer/internal/bleve" 11 + "code.gitea.io/gitea/modules/indexer/issues/internal" 12 + 13 + "github.com/blevesearch/bleve/v2" 14 + "github.com/blevesearch/bleve/v2/analysis/analyzer/custom" 15 + "github.com/blevesearch/bleve/v2/analysis/token/camelcase" 16 + "github.com/blevesearch/bleve/v2/analysis/token/lowercase" 17 + "github.com/blevesearch/bleve/v2/analysis/token/unicodenorm" 18 + "github.com/blevesearch/bleve/v2/analysis/tokenizer/unicode" 19 + "github.com/blevesearch/bleve/v2/mapping" 20 + "github.com/blevesearch/bleve/v2/search/query" 21 + ) 22 + 23 + const ( 24 + issueIndexerAnalyzer = "issueIndexer" 25 + issueIndexerDocType = "issueIndexerDocType" 26 + issueIndexerLatestVersion = 2 27 + ) 28 + 29 + // numericEqualityQuery a numeric equality query for the given value and field 30 + func numericEqualityQuery(value int64, field string) *query.NumericRangeQuery { 31 + f := float64(value) 32 + tru := true 33 + q := bleve.NewNumericRangeInclusiveQuery(&f, &f, &tru, &tru) 34 + q.SetField(field) 35 + return q 36 + } 37 + 38 + func newMatchPhraseQuery(matchPhrase, field, analyzer string) *query.MatchPhraseQuery { 39 + q := bleve.NewMatchPhraseQuery(matchPhrase) 40 + q.FieldVal = field 41 + q.Analyzer = analyzer 42 + return q 43 + } 44 + 45 + const unicodeNormalizeName = "unicodeNormalize" 46 + 47 + func addUnicodeNormalizeTokenFilter(m *mapping.IndexMappingImpl) error { 48 + return m.AddCustomTokenFilter(unicodeNormalizeName, map[string]interface{}{ 49 + "type": unicodenorm.Name, 50 + "form": unicodenorm.NFC, 51 + }) 52 + } 53 + 54 + const maxBatchSize = 16 55 + 56 + // IndexerData an update to the issue indexer 57 + type IndexerData internal.IndexerData 58 + 59 + // Type returns the document type, for bleve's mapping.Classifier interface. 60 + func (i *IndexerData) Type() string { 61 + return issueIndexerDocType 62 + } 63 + 64 + // generateIssueIndexMapping generates the bleve index mapping for issues 65 + func generateIssueIndexMapping() (mapping.IndexMapping, error) { 66 + mapping := bleve.NewIndexMapping() 67 + docMapping := bleve.NewDocumentMapping() 68 + 69 + numericFieldMapping := bleve.NewNumericFieldMapping() 70 + numericFieldMapping.IncludeInAll = false 71 + docMapping.AddFieldMappingsAt("RepoID", numericFieldMapping) 72 + 73 + textFieldMapping := bleve.NewTextFieldMapping() 74 + textFieldMapping.Store = false 75 + textFieldMapping.IncludeInAll = false 76 + docMapping.AddFieldMappingsAt("Title", textFieldMapping) 77 + docMapping.AddFieldMappingsAt("Content", textFieldMapping) 78 + docMapping.AddFieldMappingsAt("Comments", textFieldMapping) 79 + 80 + if err := addUnicodeNormalizeTokenFilter(mapping); err != nil { 81 + return nil, err 82 + } else if err = mapping.AddCustomAnalyzer(issueIndexerAnalyzer, map[string]interface{}{ 83 + "type": custom.Name, 84 + "char_filters": []string{}, 85 + "tokenizer": unicode.Name, 86 + "token_filters": []string{unicodeNormalizeName, camelcase.Name, lowercase.Name}, 87 + }); err != nil { 88 + return nil, err 89 + } 90 + 91 + mapping.DefaultAnalyzer = issueIndexerAnalyzer 92 + mapping.AddDocumentMapping(issueIndexerDocType, docMapping) 93 + mapping.AddDocumentMapping("_all", bleve.NewDocumentDisabledMapping()) 94 + 95 + return mapping, nil 96 + } 97 + 98 + var _ internal.Indexer = &Indexer{} 99 + 100 + // Indexer implements Indexer interface 101 + type Indexer struct { 102 + inner *inner_bleve.Indexer 103 + indexer_internal.Indexer // do not composite inner_bleve.Indexer directly to avoid exposing too much 104 + } 105 + 106 + // NewIndexer creates a new bleve local indexer 107 + func NewIndexer(indexDir string) *Indexer { 108 + inner := inner_bleve.NewIndexer(indexDir, issueIndexerLatestVersion, generateIssueIndexMapping) 109 + return &Indexer{ 110 + Indexer: inner, 111 + inner: inner, 112 + } 113 + } 114 + 115 + // Index will save the index data 116 + func (b *Indexer) Index(_ context.Context, issues []*internal.IndexerData) error { 117 + batch := inner_bleve.NewFlushingBatch(b.inner.Indexer, maxBatchSize) 118 + for _, issue := range issues { 119 + if err := batch.Index(indexer_internal.Base36(issue.ID), struct { 120 + RepoID int64 121 + Title string 122 + Content string 123 + Comments []string 124 + }{ 125 + RepoID: issue.RepoID, 126 + Title: issue.Title, 127 + Content: issue.Content, 128 + Comments: issue.Comments, 129 + }); err != nil { 130 + return err 131 + } 132 + } 133 + return batch.Flush() 134 + } 135 + 136 + // Delete deletes indexes by ids 137 + func (b *Indexer) Delete(_ context.Context, ids ...int64) error { 138 + batch := inner_bleve.NewFlushingBatch(b.inner.Indexer, maxBatchSize) 139 + for _, id := range ids { 140 + if err := batch.Delete(indexer_internal.Base36(id)); err != nil { 141 + return err 142 + } 143 + } 144 + return batch.Flush() 145 + } 146 + 147 + // Search searches for issues by given conditions. 148 + // Returns the matching issue IDs 149 + func (b *Indexer) Search(ctx context.Context, keyword string, repoIDs []int64, limit, start int) (*internal.SearchResult, error) { 150 + var repoQueriesP []*query.NumericRangeQuery 151 + for _, repoID := range repoIDs { 152 + repoQueriesP = append(repoQueriesP, numericEqualityQuery(repoID, "RepoID")) 153 + } 154 + repoQueries := make([]query.Query, len(repoQueriesP)) 155 + for i, v := range repoQueriesP { 156 + repoQueries[i] = query.Query(v) 157 + } 158 + 159 + indexerQuery := bleve.NewConjunctionQuery( 160 + bleve.NewDisjunctionQuery(repoQueries...), 161 + bleve.NewDisjunctionQuery( 162 + newMatchPhraseQuery(keyword, "Title", issueIndexerAnalyzer), 163 + newMatchPhraseQuery(keyword, "Content", issueIndexerAnalyzer), 164 + newMatchPhraseQuery(keyword, "Comments", issueIndexerAnalyzer), 165 + )) 166 + search := bleve.NewSearchRequestOptions(indexerQuery, limit, start, false) 167 + search.SortBy([]string{"-_score"}) 168 + 169 + result, err := b.inner.Indexer.SearchInContext(ctx, search) 170 + if err != nil { 171 + return nil, err 172 + } 173 + 174 + ret := internal.SearchResult{ 175 + Hits: make([]internal.Match, 0, len(result.Hits)), 176 + } 177 + for _, hit := range result.Hits { 178 + id, err := indexer_internal.ParseBase36(hit.ID) 179 + if err != nil { 180 + return nil, err 181 + } 182 + ret.Hits = append(ret.Hits, internal.Match{ 183 + ID: id, 184 + }) 185 + } 186 + return &ret, nil 187 + }
+6 -4
modules/indexer/issues/bleve_test.go modules/indexer/issues/bleve/bleve_test.go
··· 1 1 // Copyright 2018 The Gitea Authors. All rights reserved. 2 2 // SPDX-License-Identifier: MIT 3 3 4 - package issues 4 + package bleve 5 5 6 6 import ( 7 7 "context" 8 8 "testing" 9 9 10 + "code.gitea.io/gitea/modules/indexer/issues/internal" 11 + 10 12 "github.com/stretchr/testify/assert" 11 13 ) 12 14 13 15 func TestBleveIndexAndSearch(t *testing.T) { 14 16 dir := t.TempDir() 15 - indexer := NewBleveIndexer(dir) 17 + indexer := NewIndexer(dir) 16 18 defer indexer.Close() 17 19 18 - if _, err := indexer.Init(); err != nil { 20 + if _, err := indexer.Init(context.Background()); err != nil { 19 21 assert.Fail(t, "Unable to initialize bleve indexer: %v", err) 20 22 return 21 23 } 22 24 23 - err := indexer.Index([]*IndexerData{ 25 + err := indexer.Index(context.Background(), []*internal.IndexerData{ 24 26 { 25 27 ID: 1, 26 28 RepoID: 2,
-56
modules/indexer/issues/db.go
··· 1 - // Copyright 2019 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package issues 5 - 6 - import ( 7 - "context" 8 - 9 - "code.gitea.io/gitea/models/db" 10 - issues_model "code.gitea.io/gitea/models/issues" 11 - ) 12 - 13 - // DBIndexer implements Indexer interface to use database's like search 14 - type DBIndexer struct{} 15 - 16 - // Init dummy function 17 - func (i *DBIndexer) Init() (bool, error) { 18 - return false, nil 19 - } 20 - 21 - // Ping checks if database is available 22 - func (i *DBIndexer) Ping() bool { 23 - return db.GetEngine(db.DefaultContext).Ping() != nil 24 - } 25 - 26 - // Index dummy function 27 - func (i *DBIndexer) Index(issue []*IndexerData) error { 28 - return nil 29 - } 30 - 31 - // Delete dummy function 32 - func (i *DBIndexer) Delete(ids ...int64) error { 33 - return nil 34 - } 35 - 36 - // Close dummy function 37 - func (i *DBIndexer) Close() { 38 - } 39 - 40 - // Search dummy function 41 - func (i *DBIndexer) Search(ctx context.Context, kw string, repoIDs []int64, limit, start int) (*SearchResult, error) { 42 - total, ids, err := issues_model.SearchIssueIDsByKeyword(ctx, kw, repoIDs, limit, start) 43 - if err != nil { 44 - return nil, err 45 - } 46 - result := SearchResult{ 47 - Total: total, 48 - Hits: make([]Match, 0, limit), 49 - } 50 - for _, id := range ids { 51 - result.Hits = append(result.Hits, Match{ 52 - ID: id, 53 - }) 54 - } 55 - return &result, nil 56 - }
+54
modules/indexer/issues/db/db.go
··· 1 + // Copyright 2019 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package db 5 + 6 + import ( 7 + "context" 8 + 9 + issues_model "code.gitea.io/gitea/models/issues" 10 + indexer_internal "code.gitea.io/gitea/modules/indexer/internal" 11 + inner_db "code.gitea.io/gitea/modules/indexer/internal/db" 12 + "code.gitea.io/gitea/modules/indexer/issues/internal" 13 + ) 14 + 15 + var _ internal.Indexer = &Indexer{} 16 + 17 + // Indexer implements Indexer interface to use database's like search 18 + type Indexer struct { 19 + indexer_internal.Indexer 20 + } 21 + 22 + func NewIndexer() *Indexer { 23 + return &Indexer{ 24 + Indexer: &inner_db.Indexer{}, 25 + } 26 + } 27 + 28 + // Index dummy function 29 + func (i *Indexer) Index(_ context.Context, _ []*internal.IndexerData) error { 30 + return nil 31 + } 32 + 33 + // Delete dummy function 34 + func (i *Indexer) Delete(_ context.Context, _ ...int64) error { 35 + return nil 36 + } 37 + 38 + // Search searches for issues 39 + func (i *Indexer) Search(ctx context.Context, kw string, repoIDs []int64, limit, start int) (*internal.SearchResult, error) { 40 + total, ids, err := issues_model.SearchIssueIDsByKeyword(ctx, kw, repoIDs, limit, start) 41 + if err != nil { 42 + return nil, err 43 + } 44 + result := internal.SearchResult{ 45 + Total: total, 46 + Hits: make([]internal.Match, 0, limit), 47 + } 48 + for _, id := range ids { 49 + result.Hits = append(result.Hits, internal.Match{ 50 + ID: id, 51 + }) 52 + } 53 + return &result, nil 54 + }
-287
modules/indexer/issues/elastic_search.go
··· 1 - // Copyright 2019 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package issues 5 - 6 - import ( 7 - "context" 8 - "errors" 9 - "fmt" 10 - "net" 11 - "strconv" 12 - "sync" 13 - "time" 14 - 15 - "code.gitea.io/gitea/modules/graceful" 16 - "code.gitea.io/gitea/modules/log" 17 - 18 - "github.com/olivere/elastic/v7" 19 - ) 20 - 21 - var _ Indexer = &ElasticSearchIndexer{} 22 - 23 - // ElasticSearchIndexer implements Indexer interface 24 - type ElasticSearchIndexer struct { 25 - client *elastic.Client 26 - indexerName string 27 - available bool 28 - stopTimer chan struct{} 29 - lock sync.RWMutex 30 - } 31 - 32 - // NewElasticSearchIndexer creates a new elasticsearch indexer 33 - func NewElasticSearchIndexer(url, indexerName string) (*ElasticSearchIndexer, error) { 34 - opts := []elastic.ClientOptionFunc{ 35 - elastic.SetURL(url), 36 - elastic.SetSniff(false), 37 - elastic.SetHealthcheckInterval(10 * time.Second), 38 - elastic.SetGzip(false), 39 - } 40 - 41 - logger := log.GetLogger(log.DEFAULT) 42 - opts = append(opts, elastic.SetTraceLog(&log.PrintfLogger{Logf: logger.Trace})) 43 - opts = append(opts, elastic.SetInfoLog(&log.PrintfLogger{Logf: logger.Info})) 44 - opts = append(opts, elastic.SetErrorLog(&log.PrintfLogger{Logf: logger.Error})) 45 - 46 - client, err := elastic.NewClient(opts...) 47 - if err != nil { 48 - return nil, err 49 - } 50 - 51 - indexer := &ElasticSearchIndexer{ 52 - client: client, 53 - indexerName: indexerName, 54 - available: true, 55 - stopTimer: make(chan struct{}), 56 - } 57 - 58 - ticker := time.NewTicker(10 * time.Second) 59 - go func() { 60 - for { 61 - select { 62 - case <-ticker.C: 63 - indexer.checkAvailability() 64 - case <-indexer.stopTimer: 65 - ticker.Stop() 66 - return 67 - } 68 - } 69 - }() 70 - 71 - return indexer, nil 72 - } 73 - 74 - const ( 75 - defaultMapping = `{ 76 - "mappings": { 77 - "properties": { 78 - "id": { 79 - "type": "integer", 80 - "index": true 81 - }, 82 - "repo_id": { 83 - "type": "integer", 84 - "index": true 85 - }, 86 - "title": { 87 - "type": "text", 88 - "index": true 89 - }, 90 - "content": { 91 - "type": "text", 92 - "index": true 93 - }, 94 - "comments": { 95 - "type" : "text", 96 - "index": true 97 - } 98 - } 99 - } 100 - }` 101 - ) 102 - 103 - // Init will initialize the indexer 104 - func (b *ElasticSearchIndexer) Init() (bool, error) { 105 - ctx := graceful.GetManager().HammerContext() 106 - exists, err := b.client.IndexExists(b.indexerName).Do(ctx) 107 - if err != nil { 108 - return false, b.checkError(err) 109 - } 110 - 111 - if !exists { 112 - mapping := defaultMapping 113 - 114 - createIndex, err := b.client.CreateIndex(b.indexerName).BodyString(mapping).Do(ctx) 115 - if err != nil { 116 - return false, b.checkError(err) 117 - } 118 - if !createIndex.Acknowledged { 119 - return false, errors.New("init failed") 120 - } 121 - 122 - return false, nil 123 - } 124 - return true, nil 125 - } 126 - 127 - // Ping checks if elastic is available 128 - func (b *ElasticSearchIndexer) Ping() bool { 129 - b.lock.RLock() 130 - defer b.lock.RUnlock() 131 - return b.available 132 - } 133 - 134 - // Index will save the index data 135 - func (b *ElasticSearchIndexer) Index(issues []*IndexerData) error { 136 - if len(issues) == 0 { 137 - return nil 138 - } else if len(issues) == 1 { 139 - issue := issues[0] 140 - _, err := b.client.Index(). 141 - Index(b.indexerName). 142 - Id(fmt.Sprintf("%d", issue.ID)). 143 - BodyJson(map[string]interface{}{ 144 - "id": issue.ID, 145 - "repo_id": issue.RepoID, 146 - "title": issue.Title, 147 - "content": issue.Content, 148 - "comments": issue.Comments, 149 - }). 150 - Do(graceful.GetManager().HammerContext()) 151 - return b.checkError(err) 152 - } 153 - 154 - reqs := make([]elastic.BulkableRequest, 0) 155 - for _, issue := range issues { 156 - reqs = append(reqs, 157 - elastic.NewBulkIndexRequest(). 158 - Index(b.indexerName). 159 - Id(fmt.Sprintf("%d", issue.ID)). 160 - Doc(map[string]interface{}{ 161 - "id": issue.ID, 162 - "repo_id": issue.RepoID, 163 - "title": issue.Title, 164 - "content": issue.Content, 165 - "comments": issue.Comments, 166 - }), 167 - ) 168 - } 169 - 170 - _, err := b.client.Bulk(). 171 - Index(b.indexerName). 172 - Add(reqs...). 173 - Do(graceful.GetManager().HammerContext()) 174 - return b.checkError(err) 175 - } 176 - 177 - // Delete deletes indexes by ids 178 - func (b *ElasticSearchIndexer) Delete(ids ...int64) error { 179 - if len(ids) == 0 { 180 - return nil 181 - } else if len(ids) == 1 { 182 - _, err := b.client.Delete(). 183 - Index(b.indexerName). 184 - Id(fmt.Sprintf("%d", ids[0])). 185 - Do(graceful.GetManager().HammerContext()) 186 - return b.checkError(err) 187 - } 188 - 189 - reqs := make([]elastic.BulkableRequest, 0) 190 - for _, id := range ids { 191 - reqs = append(reqs, 192 - elastic.NewBulkDeleteRequest(). 193 - Index(b.indexerName). 194 - Id(fmt.Sprintf("%d", id)), 195 - ) 196 - } 197 - 198 - _, err := b.client.Bulk(). 199 - Index(b.indexerName). 200 - Add(reqs...). 201 - Do(graceful.GetManager().HammerContext()) 202 - return b.checkError(err) 203 - } 204 - 205 - // Search searches for issues by given conditions. 206 - // Returns the matching issue IDs 207 - func (b *ElasticSearchIndexer) Search(ctx context.Context, keyword string, repoIDs []int64, limit, start int) (*SearchResult, error) { 208 - kwQuery := elastic.NewMultiMatchQuery(keyword, "title", "content", "comments") 209 - query := elastic.NewBoolQuery() 210 - query = query.Must(kwQuery) 211 - if len(repoIDs) > 0 { 212 - repoStrs := make([]interface{}, 0, len(repoIDs)) 213 - for _, repoID := range repoIDs { 214 - repoStrs = append(repoStrs, repoID) 215 - } 216 - repoQuery := elastic.NewTermsQuery("repo_id", repoStrs...) 217 - query = query.Must(repoQuery) 218 - } 219 - searchResult, err := b.client.Search(). 220 - Index(b.indexerName). 221 - Query(query). 222 - Sort("_score", false). 223 - From(start).Size(limit). 224 - Do(ctx) 225 - if err != nil { 226 - return nil, b.checkError(err) 227 - } 228 - 229 - hits := make([]Match, 0, limit) 230 - for _, hit := range searchResult.Hits.Hits { 231 - id, _ := strconv.ParseInt(hit.Id, 10, 64) 232 - hits = append(hits, Match{ 233 - ID: id, 234 - }) 235 - } 236 - 237 - return &SearchResult{ 238 - Total: searchResult.TotalHits(), 239 - Hits: hits, 240 - }, nil 241 - } 242 - 243 - // Close implements indexer 244 - func (b *ElasticSearchIndexer) Close() { 245 - select { 246 - case <-b.stopTimer: 247 - default: 248 - close(b.stopTimer) 249 - } 250 - } 251 - 252 - func (b *ElasticSearchIndexer) checkError(err error) error { 253 - var opErr *net.OpError 254 - if !(elastic.IsConnErr(err) || (errors.As(err, &opErr) && (opErr.Op == "dial" || opErr.Op == "read"))) { 255 - return err 256 - } 257 - 258 - b.setAvailability(false) 259 - 260 - return err 261 - } 262 - 263 - func (b *ElasticSearchIndexer) checkAvailability() { 264 - if b.Ping() { 265 - return 266 - } 267 - 268 - // Request cluster state to check if elastic is available again 269 - _, err := b.client.ClusterState().Do(graceful.GetManager().ShutdownContext()) 270 - if err != nil { 271 - b.setAvailability(false) 272 - return 273 - } 274 - 275 - b.setAvailability(true) 276 - } 277 - 278 - func (b *ElasticSearchIndexer) setAvailability(available bool) { 279 - b.lock.Lock() 280 - defer b.lock.Unlock() 281 - 282 - if b.available == available { 283 - return 284 - } 285 - 286 - b.available = available 287 - }
+177
modules/indexer/issues/elasticsearch/elasticsearch.go
··· 1 + // Copyright 2019 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package elasticsearch 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + "strconv" 10 + 11 + "code.gitea.io/gitea/modules/graceful" 12 + indexer_internal "code.gitea.io/gitea/modules/indexer/internal" 13 + inner_elasticsearch "code.gitea.io/gitea/modules/indexer/internal/elasticsearch" 14 + "code.gitea.io/gitea/modules/indexer/issues/internal" 15 + 16 + "github.com/olivere/elastic/v7" 17 + ) 18 + 19 + const ( 20 + issueIndexerLatestVersion = 0 21 + ) 22 + 23 + var _ internal.Indexer = &Indexer{} 24 + 25 + // Indexer implements Indexer interface 26 + type Indexer struct { 27 + inner *inner_elasticsearch.Indexer 28 + indexer_internal.Indexer // do not composite inner_elasticsearch.Indexer directly to avoid exposing too much 29 + } 30 + 31 + // NewIndexer creates a new elasticsearch indexer 32 + func NewIndexer(url, indexerName string) *Indexer { 33 + inner := inner_elasticsearch.NewIndexer(url, indexerName, issueIndexerLatestVersion, defaultMapping) 34 + indexer := &Indexer{ 35 + inner: inner, 36 + Indexer: inner, 37 + } 38 + return indexer 39 + } 40 + 41 + const ( 42 + defaultMapping = `{ 43 + "mappings": { 44 + "properties": { 45 + "id": { 46 + "type": "integer", 47 + "index": true 48 + }, 49 + "repo_id": { 50 + "type": "integer", 51 + "index": true 52 + }, 53 + "title": { 54 + "type": "text", 55 + "index": true 56 + }, 57 + "content": { 58 + "type": "text", 59 + "index": true 60 + }, 61 + "comments": { 62 + "type" : "text", 63 + "index": true 64 + } 65 + } 66 + } 67 + }` 68 + ) 69 + 70 + // Index will save the index data 71 + func (b *Indexer) Index(ctx context.Context, issues []*internal.IndexerData) error { 72 + if len(issues) == 0 { 73 + return nil 74 + } else if len(issues) == 1 { 75 + issue := issues[0] 76 + _, err := b.inner.Client.Index(). 77 + Index(b.inner.VersionedIndexName()). 78 + Id(fmt.Sprintf("%d", issue.ID)). 79 + BodyJson(map[string]interface{}{ 80 + "id": issue.ID, 81 + "repo_id": issue.RepoID, 82 + "title": issue.Title, 83 + "content": issue.Content, 84 + "comments": issue.Comments, 85 + }). 86 + Do(ctx) 87 + return err 88 + } 89 + 90 + reqs := make([]elastic.BulkableRequest, 0) 91 + for _, issue := range issues { 92 + reqs = append(reqs, 93 + elastic.NewBulkIndexRequest(). 94 + Index(b.inner.VersionedIndexName()). 95 + Id(fmt.Sprintf("%d", issue.ID)). 96 + Doc(map[string]interface{}{ 97 + "id": issue.ID, 98 + "repo_id": issue.RepoID, 99 + "title": issue.Title, 100 + "content": issue.Content, 101 + "comments": issue.Comments, 102 + }), 103 + ) 104 + } 105 + 106 + _, err := b.inner.Client.Bulk(). 107 + Index(b.inner.VersionedIndexName()). 108 + Add(reqs...). 109 + Do(graceful.GetManager().HammerContext()) 110 + return err 111 + } 112 + 113 + // Delete deletes indexes by ids 114 + func (b *Indexer) Delete(ctx context.Context, ids ...int64) error { 115 + if len(ids) == 0 { 116 + return nil 117 + } else if len(ids) == 1 { 118 + _, err := b.inner.Client.Delete(). 119 + Index(b.inner.VersionedIndexName()). 120 + Id(fmt.Sprintf("%d", ids[0])). 121 + Do(ctx) 122 + return err 123 + } 124 + 125 + reqs := make([]elastic.BulkableRequest, 0) 126 + for _, id := range ids { 127 + reqs = append(reqs, 128 + elastic.NewBulkDeleteRequest(). 129 + Index(b.inner.VersionedIndexName()). 130 + Id(fmt.Sprintf("%d", id)), 131 + ) 132 + } 133 + 134 + _, err := b.inner.Client.Bulk(). 135 + Index(b.inner.VersionedIndexName()). 136 + Add(reqs...). 137 + Do(graceful.GetManager().HammerContext()) 138 + return err 139 + } 140 + 141 + // Search searches for issues by given conditions. 142 + // Returns the matching issue IDs 143 + func (b *Indexer) Search(ctx context.Context, keyword string, repoIDs []int64, limit, start int) (*internal.SearchResult, error) { 144 + kwQuery := elastic.NewMultiMatchQuery(keyword, "title", "content", "comments") 145 + query := elastic.NewBoolQuery() 146 + query = query.Must(kwQuery) 147 + if len(repoIDs) > 0 { 148 + repoStrs := make([]interface{}, 0, len(repoIDs)) 149 + for _, repoID := range repoIDs { 150 + repoStrs = append(repoStrs, repoID) 151 + } 152 + repoQuery := elastic.NewTermsQuery("repo_id", repoStrs...) 153 + query = query.Must(repoQuery) 154 + } 155 + searchResult, err := b.inner.Client.Search(). 156 + Index(b.inner.VersionedIndexName()). 157 + Query(query). 158 + Sort("_score", false). 159 + From(start).Size(limit). 160 + Do(ctx) 161 + if err != nil { 162 + return nil, err 163 + } 164 + 165 + hits := make([]internal.Match, 0, limit) 166 + for _, hit := range searchResult.Hits.Hits { 167 + id, _ := strconv.ParseInt(hit.Id, 10, 64) 168 + hits = append(hits, internal.Match{ 169 + ID: id, 170 + }) 171 + } 172 + 173 + return &internal.SearchResult{ 174 + Total: searchResult.TotalHits(), 175 + Hits: hits, 176 + }, nil 177 + }
+55 -144
modules/indexer/issues/indexer.go
··· 5 5 6 6 import ( 7 7 "context" 8 - "fmt" 9 8 "os" 10 9 "runtime/pprof" 11 - "sync" 10 + "sync/atomic" 12 11 "time" 13 12 14 - "code.gitea.io/gitea/models/db" 13 + db_model "code.gitea.io/gitea/models/db" 15 14 issues_model "code.gitea.io/gitea/models/issues" 16 15 repo_model "code.gitea.io/gitea/models/repo" 17 16 "code.gitea.io/gitea/modules/graceful" 17 + "code.gitea.io/gitea/modules/indexer/issues/bleve" 18 + "code.gitea.io/gitea/modules/indexer/issues/db" 19 + "code.gitea.io/gitea/modules/indexer/issues/elasticsearch" 20 + "code.gitea.io/gitea/modules/indexer/issues/internal" 21 + "code.gitea.io/gitea/modules/indexer/issues/meilisearch" 18 22 "code.gitea.io/gitea/modules/log" 19 23 "code.gitea.io/gitea/modules/process" 20 24 "code.gitea.io/gitea/modules/queue" ··· 22 26 "code.gitea.io/gitea/modules/util" 23 27 ) 24 28 25 - // IndexerData data stored in the issue indexer 26 - type IndexerData struct { 27 - ID int64 `json:"id"` 28 - RepoID int64 `json:"repo_id"` 29 - Title string `json:"title"` 30 - Content string `json:"content"` 31 - Comments []string `json:"comments"` 32 - IsDelete bool `json:"is_delete"` 33 - IDs []int64 `json:"ids"` 34 - } 35 - 36 - // Match represents on search result 37 - type Match struct { 38 - ID int64 `json:"id"` 39 - Score float64 `json:"score"` 40 - } 41 - 42 - // SearchResult represents search results 43 - type SearchResult struct { 44 - Total int64 45 - Hits []Match 46 - } 47 - 48 - // Indexer defines an interface to indexer issues contents 49 - type Indexer interface { 50 - Init() (bool, error) 51 - Ping() bool 52 - Index(issue []*IndexerData) error 53 - Delete(ids ...int64) error 54 - Search(ctx context.Context, kw string, repoIDs []int64, limit, start int) (*SearchResult, error) 55 - Close() 56 - } 57 - 58 - type indexerHolder struct { 59 - indexer Indexer 60 - mutex sync.RWMutex 61 - cond *sync.Cond 62 - cancelled bool 63 - } 64 - 65 - func newIndexerHolder() *indexerHolder { 66 - h := &indexerHolder{} 67 - h.cond = sync.NewCond(h.mutex.RLocker()) 68 - return h 69 - } 70 - 71 - func (h *indexerHolder) cancel() { 72 - h.mutex.Lock() 73 - defer h.mutex.Unlock() 74 - h.cancelled = true 75 - h.cond.Broadcast() 76 - } 77 - 78 - func (h *indexerHolder) set(indexer Indexer) { 79 - h.mutex.Lock() 80 - defer h.mutex.Unlock() 81 - h.indexer = indexer 82 - h.cond.Broadcast() 83 - } 84 - 85 - func (h *indexerHolder) get() Indexer { 86 - h.mutex.RLock() 87 - defer h.mutex.RUnlock() 88 - if h.indexer == nil && !h.cancelled { 89 - h.cond.Wait() 90 - } 91 - return h.indexer 92 - } 93 - 94 29 var ( 95 30 // issueIndexerQueue queue of issue ids to be updated 96 - issueIndexerQueue *queue.WorkerPoolQueue[*IndexerData] 97 - holder = newIndexerHolder() 31 + issueIndexerQueue *queue.WorkerPoolQueue[*internal.IndexerData] 32 + // globalIndexer is the global indexer, it cannot be nil. 33 + // When the real indexer is not ready, it will be a dummy indexer which will return error to explain it's not ready. 34 + // So it's always safe use it as *globalIndexer.Load() and call its methods. 35 + globalIndexer atomic.Pointer[internal.Indexer] 36 + dummyIndexer *internal.Indexer 98 37 ) 99 38 39 + func init() { 40 + i := internal.NewDummyIndexer() 41 + dummyIndexer = &i 42 + globalIndexer.Store(dummyIndexer) 43 + } 44 + 100 45 // InitIssueIndexer initialize issue indexer, syncReindex is true then reindex until 101 46 // all issue index done. 102 47 func InitIssueIndexer(syncReindex bool) { ··· 107 52 // Create the Queue 108 53 switch setting.Indexer.IssueType { 109 54 case "bleve", "elasticsearch", "meilisearch": 110 - handler := func(items ...*IndexerData) (unhandled []*IndexerData) { 111 - indexer := holder.get() 112 - if indexer == nil { 113 - log.Warn("Issue indexer handler: indexer is not ready, retry later.") 114 - return items 115 - } 116 - toIndex := make([]*IndexerData, 0, len(items)) 55 + handler := func(items ...*internal.IndexerData) (unhandled []*internal.IndexerData) { 56 + indexer := *globalIndexer.Load() 57 + toIndex := make([]*internal.IndexerData, 0, len(items)) 117 58 for _, indexerData := range items { 118 59 log.Trace("IndexerData Process: %d %v %t", indexerData.ID, indexerData.IDs, indexerData.IsDelete) 119 60 if indexerData.IsDelete { 120 - if err := indexer.Delete(indexerData.IDs...); err != nil { 61 + if err := indexer.Delete(ctx, indexerData.IDs...); err != nil { 121 62 log.Error("Issue indexer handler: failed to from index: %v Error: %v", indexerData.IDs, err) 122 - if !indexer.Ping() { 123 - log.Error("Issue indexer handler: indexer is unavailable when deleting") 124 - unhandled = append(unhandled, indexerData) 125 - } 63 + unhandled = append(unhandled, indexerData) 126 64 } 127 65 continue 128 66 } 129 67 toIndex = append(toIndex, indexerData) 130 68 } 131 - if err := indexer.Index(toIndex); err != nil { 69 + if err := indexer.Index(ctx, toIndex); err != nil { 132 70 log.Error("Error whilst indexing: %v Error: %v", toIndex, err) 133 - if !indexer.Ping() { 134 - log.Error("Issue indexer handler: indexer is unavailable when indexing") 135 - unhandled = append(unhandled, toIndex...) 136 - } 71 + unhandled = append(unhandled, toIndex...) 137 72 } 138 73 return unhandled 139 74 } ··· 144 79 log.Fatal("Unable to create issue indexer queue") 145 80 } 146 81 default: 147 - issueIndexerQueue = queue.CreateSimpleQueue[*IndexerData](ctx, "issue_indexer", nil) 82 + issueIndexerQueue = queue.CreateSimpleQueue[*internal.IndexerData](ctx, "issue_indexer", nil) 148 83 } 149 84 150 85 graceful.GetManager().RunAtTerminate(finished) ··· 154 89 pprof.SetGoroutineLabels(ctx) 155 90 start := time.Now() 156 91 log.Info("PID %d: Initializing Issue Indexer: %s", os.Getpid(), setting.Indexer.IssueType) 157 - var populate bool 92 + var ( 93 + issueIndexer internal.Indexer 94 + existed bool 95 + err error 96 + ) 158 97 switch setting.Indexer.IssueType { 159 98 case "bleve": 160 99 defer func() { ··· 162 101 log.Error("PANIC whilst initializing issue indexer: %v\nStacktrace: %s", err, log.Stack(2)) 163 102 log.Error("The indexer files are likely corrupted and may need to be deleted") 164 103 log.Error("You can completely remove the %q directory to make Gitea recreate the indexes", setting.Indexer.IssuePath) 165 - holder.cancel() 104 + globalIndexer.Store(dummyIndexer) 166 105 log.Fatal("PID: %d Unable to initialize the Bleve Issue Indexer at path: %s Error: %v", os.Getpid(), setting.Indexer.IssuePath, err) 167 106 } 168 107 }() 169 - issueIndexer := NewBleveIndexer(setting.Indexer.IssuePath) 170 - exist, err := issueIndexer.Init() 108 + issueIndexer = bleve.NewIndexer(setting.Indexer.IssuePath) 109 + existed, err = issueIndexer.Init(ctx) 171 110 if err != nil { 172 - holder.cancel() 173 111 log.Fatal("Unable to initialize Bleve Issue Indexer at path: %s Error: %v", setting.Indexer.IssuePath, err) 174 112 } 175 - populate = !exist 176 - holder.set(issueIndexer) 177 - graceful.GetManager().RunAtTerminate(func() { 178 - log.Debug("Closing issue indexer") 179 - issueIndexer := holder.get() 180 - if issueIndexer != nil { 181 - issueIndexer.Close() 182 - } 183 - log.Info("PID: %d Issue Indexer closed", os.Getpid()) 184 - }) 185 - log.Debug("Created Bleve Indexer") 186 113 case "elasticsearch": 187 - issueIndexer, err := NewElasticSearchIndexer(setting.Indexer.IssueConnStr, setting.Indexer.IssueIndexerName) 188 - if err != nil { 189 - log.Fatal("Unable to initialize Elastic Search Issue Indexer at connection: %s Error: %v", setting.Indexer.IssueConnStr, err) 190 - } 191 - exist, err := issueIndexer.Init() 114 + issueIndexer = elasticsearch.NewIndexer(setting.Indexer.IssueConnStr, setting.Indexer.IssueIndexerName) 115 + existed, err = issueIndexer.Init(ctx) 192 116 if err != nil { 193 117 log.Fatal("Unable to issueIndexer.Init with connection %s Error: %v", setting.Indexer.IssueConnStr, err) 194 118 } 195 - populate = !exist 196 - holder.set(issueIndexer) 197 119 case "db": 198 - issueIndexer := &DBIndexer{} 199 - holder.set(issueIndexer) 120 + issueIndexer = db.NewIndexer() 200 121 case "meilisearch": 201 - issueIndexer, err := NewMeilisearchIndexer(setting.Indexer.IssueConnStr, setting.Indexer.IssueConnAuth, setting.Indexer.IssueIndexerName) 202 - if err != nil { 203 - log.Fatal("Unable to initialize Meilisearch Issue Indexer at connection: %s Error: %v", setting.Indexer.IssueConnStr, err) 204 - } 205 - exist, err := issueIndexer.Init() 122 + issueIndexer = meilisearch.NewIndexer(setting.Indexer.IssueConnStr, setting.Indexer.IssueConnAuth, setting.Indexer.IssueIndexerName) 123 + existed, err = issueIndexer.Init(ctx) 206 124 if err != nil { 207 125 log.Fatal("Unable to issueIndexer.Init with connection %s Error: %v", setting.Indexer.IssueConnStr, err) 208 126 } 209 - populate = !exist 210 - holder.set(issueIndexer) 211 127 default: 212 - holder.cancel() 213 128 log.Fatal("Unknown issue indexer type: %s", setting.Indexer.IssueType) 214 129 } 130 + globalIndexer.Store(&issueIndexer) 131 + 132 + graceful.GetManager().RunAtTerminate(func() { 133 + log.Debug("Closing issue indexer") 134 + (*globalIndexer.Load()).Close() 135 + log.Info("PID: %d Issue Indexer closed", os.Getpid()) 136 + }) 215 137 216 138 // Start processing the queue 217 139 go graceful.GetManager().RunWithCancel(issueIndexerQueue) 218 140 219 141 // Populate the index 220 - if populate { 142 + if !existed { 221 143 if syncReindex { 222 144 graceful.GetManager().RunWithShutdownContext(populateIssueIndexer) 223 145 } else { ··· 266 188 default: 267 189 } 268 190 repos, _, err := repo_model.SearchRepositoryByName(ctx, &repo_model.SearchRepoOptions{ 269 - ListOptions: db.ListOptions{Page: page, PageSize: repo_model.RepositoryListDefaultPageSize}, 270 - OrderBy: db.SearchOrderByID, 191 + ListOptions: db_model.ListOptions{Page: page, PageSize: repo_model.RepositoryListDefaultPageSize}, 192 + OrderBy: db_model.SearchOrderByID, 271 193 Private: true, 272 194 Collaborate: util.OptionalBoolFalse, 273 195 }) ··· 320 242 comments = append(comments, comment.Content) 321 243 } 322 244 } 323 - indexerData := &IndexerData{ 245 + indexerData := &internal.IndexerData{ 324 246 ID: issue.ID, 325 247 RepoID: issue.RepoID, 326 248 Title: issue.Title, ··· 345 267 if len(ids) == 0 { 346 268 return 347 269 } 348 - indexerData := &IndexerData{ 270 + indexerData := &internal.IndexerData{ 349 271 IDs: ids, 350 272 IsDelete: true, 351 273 } ··· 358 280 // WARNNING: You have to ensure user have permission to visit repoIDs' issues 359 281 func SearchIssuesByKeyword(ctx context.Context, repoIDs []int64, keyword string) ([]int64, error) { 360 282 var issueIDs []int64 361 - indexer := holder.get() 362 - 363 - if indexer == nil { 364 - log.Error("SearchIssuesByKeyword(): unable to get indexer!") 365 - return nil, fmt.Errorf("unable to get issue indexer") 366 - } 283 + indexer := *globalIndexer.Load() 367 284 res, err := indexer.Search(ctx, keyword, repoIDs, 50, 0) 368 285 if err != nil { 369 286 return nil, err ··· 375 292 } 376 293 377 294 // IsAvailable checks if issue indexer is available 378 - func IsAvailable() bool { 379 - indexer := holder.get() 380 - if indexer == nil { 381 - log.Error("IsAvailable(): unable to get indexer!") 382 - return false 383 - } 384 - 385 - return indexer.Ping() 295 + func IsAvailable(ctx context.Context) bool { 296 + return (*globalIndexer.Load()).Ping(ctx) == nil 386 297 }
+2 -2
modules/indexer/issues/indexer_test.go
··· 11 11 "time" 12 12 13 13 "code.gitea.io/gitea/models/unittest" 14 + "code.gitea.io/gitea/modules/indexer/issues/bleve" 14 15 "code.gitea.io/gitea/modules/setting" 15 16 16 17 _ "code.gitea.io/gitea/models" ··· 42 43 setting.LoadQueueSettings() 43 44 InitIssueIndexer(true) 44 45 defer func() { 45 - indexer := holder.get() 46 - if bleveIndexer, ok := indexer.(*BleveIndexer); ok { 46 + if bleveIndexer, ok := (*globalIndexer.Load()).(*bleve.Indexer); ok { 47 47 bleveIndexer.Close() 48 48 } 49 49 }()
+42
modules/indexer/issues/internal/indexer.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package internal 5 + 6 + import ( 7 + "context" 8 + "fmt" 9 + 10 + "code.gitea.io/gitea/modules/indexer/internal" 11 + ) 12 + 13 + // Indexer defines an interface to indexer issues contents 14 + type Indexer interface { 15 + internal.Indexer 16 + Index(ctx context.Context, issue []*IndexerData) error 17 + Delete(ctx context.Context, ids ...int64) error 18 + Search(ctx context.Context, kw string, repoIDs []int64, limit, start int) (*SearchResult, error) 19 + } 20 + 21 + // NewDummyIndexer returns a dummy indexer 22 + func NewDummyIndexer() Indexer { 23 + return &dummyIndexer{ 24 + Indexer: internal.NewDummyIndexer(), 25 + } 26 + } 27 + 28 + type dummyIndexer struct { 29 + internal.Indexer 30 + } 31 + 32 + func (d *dummyIndexer) Index(ctx context.Context, issue []*IndexerData) error { 33 + return fmt.Errorf("indexer is not ready") 34 + } 35 + 36 + func (d *dummyIndexer) Delete(ctx context.Context, ids ...int64) error { 37 + return fmt.Errorf("indexer is not ready") 38 + } 39 + 40 + func (d *dummyIndexer) Search(ctx context.Context, kw string, repoIDs []int64, limit, start int) (*SearchResult, error) { 41 + return nil, fmt.Errorf("indexer is not ready") 42 + }
+27
modules/indexer/issues/internal/model.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package internal 5 + 6 + // IndexerData data stored in the issue indexer 7 + type IndexerData struct { 8 + ID int64 `json:"id"` 9 + RepoID int64 `json:"repo_id"` 10 + Title string `json:"title"` 11 + Content string `json:"content"` 12 + Comments []string `json:"comments"` 13 + IsDelete bool `json:"is_delete"` 14 + IDs []int64 `json:"ids"` 15 + } 16 + 17 + // Match represents on search result 18 + type Match struct { 19 + ID int64 `json:"id"` 20 + Score float64 `json:"score"` 21 + } 22 + 23 + // SearchResult represents search results 24 + type SearchResult struct { 25 + Total int64 26 + Hits []Match 27 + }
-173
modules/indexer/issues/meilisearch.go
··· 1 - // Copyright 2023 The Gitea Authors. All rights reserved. 2 - // SPDX-License-Identifier: MIT 3 - 4 - package issues 5 - 6 - import ( 7 - "context" 8 - "strconv" 9 - "strings" 10 - "sync" 11 - "time" 12 - 13 - "github.com/meilisearch/meilisearch-go" 14 - ) 15 - 16 - var _ Indexer = &MeilisearchIndexer{} 17 - 18 - // MeilisearchIndexer implements Indexer interface 19 - type MeilisearchIndexer struct { 20 - client *meilisearch.Client 21 - indexerName string 22 - available bool 23 - stopTimer chan struct{} 24 - lock sync.RWMutex 25 - } 26 - 27 - // MeilisearchIndexer creates a new meilisearch indexer 28 - func NewMeilisearchIndexer(url, apiKey, indexerName string) (*MeilisearchIndexer, error) { 29 - client := meilisearch.NewClient(meilisearch.ClientConfig{ 30 - Host: url, 31 - APIKey: apiKey, 32 - }) 33 - 34 - indexer := &MeilisearchIndexer{ 35 - client: client, 36 - indexerName: indexerName, 37 - available: true, 38 - stopTimer: make(chan struct{}), 39 - } 40 - 41 - ticker := time.NewTicker(10 * time.Second) 42 - go func() { 43 - for { 44 - select { 45 - case <-ticker.C: 46 - indexer.checkAvailability() 47 - case <-indexer.stopTimer: 48 - ticker.Stop() 49 - return 50 - } 51 - } 52 - }() 53 - 54 - return indexer, nil 55 - } 56 - 57 - // Init will initialize the indexer 58 - func (b *MeilisearchIndexer) Init() (bool, error) { 59 - _, err := b.client.GetIndex(b.indexerName) 60 - if err == nil { 61 - return true, nil 62 - } 63 - _, err = b.client.CreateIndex(&meilisearch.IndexConfig{ 64 - Uid: b.indexerName, 65 - PrimaryKey: "id", 66 - }) 67 - if err != nil { 68 - return false, b.checkError(err) 69 - } 70 - 71 - _, err = b.client.Index(b.indexerName).UpdateFilterableAttributes(&[]string{"repo_id"}) 72 - return false, b.checkError(err) 73 - } 74 - 75 - // Ping checks if meilisearch is available 76 - func (b *MeilisearchIndexer) Ping() bool { 77 - b.lock.RLock() 78 - defer b.lock.RUnlock() 79 - return b.available 80 - } 81 - 82 - // Index will save the index data 83 - func (b *MeilisearchIndexer) Index(issues []*IndexerData) error { 84 - if len(issues) == 0 { 85 - return nil 86 - } 87 - for _, issue := range issues { 88 - _, err := b.client.Index(b.indexerName).AddDocuments(issue) 89 - if err != nil { 90 - return b.checkError(err) 91 - } 92 - } 93 - // TODO: bulk send index data 94 - return nil 95 - } 96 - 97 - // Delete deletes indexes by ids 98 - func (b *MeilisearchIndexer) Delete(ids ...int64) error { 99 - if len(ids) == 0 { 100 - return nil 101 - } 102 - 103 - for _, id := range ids { 104 - _, err := b.client.Index(b.indexerName).DeleteDocument(strconv.FormatInt(id, 10)) 105 - if err != nil { 106 - return b.checkError(err) 107 - } 108 - } 109 - // TODO: bulk send deletes 110 - return nil 111 - } 112 - 113 - // Search searches for issues by given conditions. 114 - // Returns the matching issue IDs 115 - func (b *MeilisearchIndexer) Search(ctx context.Context, keyword string, repoIDs []int64, limit, start int) (*SearchResult, error) { 116 - repoFilters := make([]string, 0, len(repoIDs)) 117 - for _, repoID := range repoIDs { 118 - repoFilters = append(repoFilters, "repo_id = "+strconv.FormatInt(repoID, 10)) 119 - } 120 - filter := strings.Join(repoFilters, " OR ") 121 - searchRes, err := b.client.Index(b.indexerName).Search(keyword, &meilisearch.SearchRequest{ 122 - Filter: filter, 123 - Limit: int64(limit), 124 - Offset: int64(start), 125 - }) 126 - if err != nil { 127 - return nil, b.checkError(err) 128 - } 129 - 130 - hits := make([]Match, 0, len(searchRes.Hits)) 131 - for _, hit := range searchRes.Hits { 132 - hits = append(hits, Match{ 133 - ID: int64(hit.(map[string]interface{})["id"].(float64)), 134 - }) 135 - } 136 - return &SearchResult{ 137 - Total: searchRes.TotalHits, 138 - Hits: hits, 139 - }, nil 140 - } 141 - 142 - // Close implements indexer 143 - func (b *MeilisearchIndexer) Close() { 144 - select { 145 - case <-b.stopTimer: 146 - default: 147 - close(b.stopTimer) 148 - } 149 - } 150 - 151 - func (b *MeilisearchIndexer) checkError(err error) error { 152 - return err 153 - } 154 - 155 - func (b *MeilisearchIndexer) checkAvailability() { 156 - _, err := b.client.Health() 157 - if err != nil { 158 - b.setAvailability(false) 159 - return 160 - } 161 - b.setAvailability(true) 162 - } 163 - 164 - func (b *MeilisearchIndexer) setAvailability(available bool) { 165 - b.lock.Lock() 166 - defer b.lock.Unlock() 167 - 168 - if b.available == available { 169 - return 170 - } 171 - 172 - b.available = available 173 - }
+98
modules/indexer/issues/meilisearch/meilisearch.go
··· 1 + // Copyright 2023 The Gitea Authors. All rights reserved. 2 + // SPDX-License-Identifier: MIT 3 + 4 + package meilisearch 5 + 6 + import ( 7 + "context" 8 + "strconv" 9 + "strings" 10 + 11 + indexer_internal "code.gitea.io/gitea/modules/indexer/internal" 12 + inner_meilisearch "code.gitea.io/gitea/modules/indexer/internal/meilisearch" 13 + "code.gitea.io/gitea/modules/indexer/issues/internal" 14 + 15 + "github.com/meilisearch/meilisearch-go" 16 + ) 17 + 18 + const ( 19 + issueIndexerLatestVersion = 0 20 + ) 21 + 22 + var _ internal.Indexer = &Indexer{} 23 + 24 + // Indexer implements Indexer interface 25 + type Indexer struct { 26 + inner *inner_meilisearch.Indexer 27 + indexer_internal.Indexer // do not composite inner_meilisearch.Indexer directly to avoid exposing too much 28 + } 29 + 30 + // NewIndexer creates a new meilisearch indexer 31 + func NewIndexer(url, apiKey, indexerName string) *Indexer { 32 + inner := inner_meilisearch.NewIndexer(url, apiKey, indexerName, issueIndexerLatestVersion) 33 + indexer := &Indexer{ 34 + inner: inner, 35 + Indexer: inner, 36 + } 37 + return indexer 38 + } 39 + 40 + // Index will save the index data 41 + func (b *Indexer) Index(_ context.Context, issues []*internal.IndexerData) error { 42 + if len(issues) == 0 { 43 + return nil 44 + } 45 + for _, issue := range issues { 46 + _, err := b.inner.Client.Index(b.inner.VersionedIndexName()).AddDocuments(issue) 47 + if err != nil { 48 + return err 49 + } 50 + } 51 + // TODO: bulk send index data 52 + return nil 53 + } 54 + 55 + // Delete deletes indexes by ids 56 + func (b *Indexer) Delete(_ context.Context, ids ...int64) error { 57 + if len(ids) == 0 { 58 + return nil 59 + } 60 + 61 + for _, id := range ids { 62 + _, err := b.inner.Client.Index(b.inner.VersionedIndexName()).DeleteDocument(strconv.FormatInt(id, 10)) 63 + if err != nil { 64 + return err 65 + } 66 + } 67 + // TODO: bulk send deletes 68 + return nil 69 + } 70 + 71 + // Search searches for issues by given conditions. 72 + // Returns the matching issue IDs 73 + func (b *Indexer) Search(ctx context.Context, keyword string, repoIDs []int64, limit, start int) (*internal.SearchResult, error) { 74 + repoFilters := make([]string, 0, len(repoIDs)) 75 + for _, repoID := range repoIDs { 76 + repoFilters = append(repoFilters, "repo_id = "+strconv.FormatInt(repoID, 10)) 77 + } 78 + filter := strings.Join(repoFilters, " OR ") 79 + searchRes, err := b.inner.Client.Index(b.inner.VersionedIndexName()).Search(keyword, &meilisearch.SearchRequest{ 80 + Filter: filter, 81 + Limit: int64(limit), 82 + Offset: int64(start), 83 + }) 84 + if err != nil { 85 + return nil, err 86 + } 87 + 88 + hits := make([]internal.Match, 0, len(searchRes.Hits)) 89 + for _, hit := range searchRes.Hits { 90 + hits = append(hits, internal.Match{ 91 + ID: int64(hit.(map[string]interface{})["id"].(float64)), 92 + }) 93 + } 94 + return &internal.SearchResult{ 95 + Total: searchRes.TotalHits, 96 + Hits: hits, 97 + }, nil 98 + }
+1
modules/indexer/stats/indexer.go
··· 11 11 ) 12 12 13 13 // Indexer defines an interface to index repository stats 14 + // TODO: this indexer is quite different from the others, maybe this package should be moved out from module/indexer 14 15 type Indexer interface { 15 16 Index(id int64) error 16 17 Close()
+2 -2
routers/web/explore/code.go
··· 79 79 if (len(repoIDs) > 0) || isAdmin { 80 80 total, searchResults, searchResultLanguages, err = code_indexer.PerformSearch(ctx, repoIDs, language, keyword, page, setting.UI.RepoSearchPagingNum, isMatch) 81 81 if err != nil { 82 - if code_indexer.IsAvailable() { 82 + if code_indexer.IsAvailable(ctx) { 83 83 ctx.ServerError("SearchResults", err) 84 84 return 85 85 } 86 86 ctx.Data["CodeIndexerUnavailable"] = true 87 87 } else { 88 - ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable() 88 + ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable(ctx) 89 89 } 90 90 91 91 loadRepoIDs := make([]int64, 0, len(searchResults))
+1 -1
routers/web/repo/issue.go
··· 191 191 if len(keyword) > 0 { 192 192 issueIDs, err = issue_indexer.SearchIssuesByKeyword(ctx, []int64{repo.ID}, keyword) 193 193 if err != nil { 194 - if issue_indexer.IsAvailable() { 194 + if issue_indexer.IsAvailable(ctx) { 195 195 ctx.ServerError("issueIndexer.Search", err) 196 196 return 197 197 }
+2 -2
routers/web/repo/search.go
··· 45 45 total, searchResults, searchResultLanguages, err := code_indexer.PerformSearch(ctx, []int64{ctx.Repo.Repository.ID}, 46 46 language, keyword, page, setting.UI.RepoSearchPagingNum, isMatch) 47 47 if err != nil { 48 - if code_indexer.IsAvailable() { 48 + if code_indexer.IsAvailable(ctx) { 49 49 ctx.ServerError("SearchResults", err) 50 50 return 51 51 } 52 52 ctx.Data["CodeIndexerUnavailable"] = true 53 53 } else { 54 - ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable() 54 + ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable(ctx) 55 55 } 56 56 57 57 ctx.Data["SourcePath"] = ctx.Repo.Repository.Link()
+2 -2
routers/web/user/code.go
··· 71 71 if len(repoIDs) > 0 { 72 72 total, searchResults, searchResultLanguages, err = code_indexer.PerformSearch(ctx, repoIDs, language, keyword, page, setting.UI.RepoSearchPagingNum, isMatch) 73 73 if err != nil { 74 - if code_indexer.IsAvailable() { 74 + if code_indexer.IsAvailable(ctx) { 75 75 ctx.ServerError("SearchResults", err) 76 76 return 77 77 } 78 78 ctx.Data["CodeIndexerUnavailable"] = true 79 79 } else { 80 - ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable() 80 + ctx.Data["CodeIndexerUnavailable"] = !code_indexer.IsAvailable(ctx) 81 81 } 82 82 83 83 loadRepoIDs := make([]int64, 0, len(searchResults))