sweep.dev Open in urlscan Pro
52.58.254.253 Public Scan

Back to summary
Submitted URL:
http://sweep.dev/
Effective URL:
https://sweep.dev/
Submission: On March 05 via api (March 5th 2024, 12:27:46 am UTC) from US — Scanned from DE
Form analysis
0 forms found in the DOM

Text Content

You need to enable JavaScript to run this app.

 Star us on GitHub  

Sweep AIDocumentationAbout UsBlogs
Pricing
TwitterGithubDiscordEmailEmail


SWEEP: UNIT TEST MY DATA PIPELINE

AI Junior Developer that handles small features in your codebase

  Install Sweep  Join our Discord 


Trusted by engineers from



Clean up your tech debt, automatically

Sweep generates repository-level code at your command. Cut down your dev time on
mundane tasks, like tests, documentation, and refactoring.

KL

Refactor vector_db.py by making get_deeplake_vs_from_repo more modular


This PR refactors the get_deeplake_vs_from_repo function in
sweepai/core/vector_db.py to make it more modular. The function was quite large
and performed multiple tasks, including reading files from a repository,
preparing a lexical search index, scoring for vector search, computing all
scores, preparing documents, metadatas, and ids, and computing embeddings.



sweepai/core/vector_db.py

--------------------------------------------------------------------------------

54...
55
56     logger.info("Recursively getting list of files...")
57     blocked_dirs = get_blocked_dirs(repo)
58     sweep_config.exclude_dirs.extend(blocked_dirs)
59-
60-    snippets, file_list = repo_to_chunks(cloned_repo.cache_dir, sweep_config)
61-    logger.info(f"Found {len(snippets)} snippets in repository {repo_full_name}")
62-    # prepare lexical search
63-    index = prepare_index_from_snippets(
64-        snippets, len_repo_cache_dir=len(cloned_repo.cache_dir) + 1
65-    )
66-    logger.print("Prepared index from snippets")
67+    file_list, snippets, index = prepare_lexical_search_index(cloned_repo, sweep_config, repo_full_name)
68 
69     # scoring for vector search
70     files_to_scores = compute_vector_search_scores(
71         file_list, cloned_repo, repo_full_name
72     )
73
74...
75
76+def prepare_lexical_search_index(cloned_repo, sweep_config, repo_full_name):
77+    snippets, file_list = repo_to_chunks(cloned_repo.cache_dir, sweep_config)
78+    logger.info(f"Found {len(snippets)} snippets in repository {repo_full_name}")
79+    # prepare lexical search
80+    index = prepare_index_from_snippets(
81+        snippets, len_repo_cache_dir=len(cloned_repo.cache_dir) + 1
82+    )
83+    logger.print("Prepared index from snippets")
84+    return file_list, snippets, index
85



I made a Pull Request for you at #2470!

Unit test your new features, automatically

Sweep writes unit tests and fixes any broken business logic they catch.

See the example

kevinlu1248 pushed 1 commit to sweepai/sweep, editing sweepai/utils/graph_test.py


Let me write and run a unit test for the new GraphChild component.



sweepai/utils/graph_test.py

--------------------------------------------------------------------------------

1class TestGraph(unittest.TestCase):
2    def setUp(self):
3        self.graph = Graph(
4            definitions_graph=nx.DiGraph(), references_graph=nx.DiGraph()
5        )
6
7    def test_extract_first_degree(self):
8        with patch(
9            "sweepai.utils.graph.Graph.find_definitions"
10        ) as mock_find_definitions, patch(
11            "sweepai.utils.graph.Graph.find_references"
12        ) as mock_find_references, patch(
13            "sweepai.utils.graph.condense_paths"
14        ) as mock_condense_paths:
15            mock_find_definitions.return_value = [["file1", "symbol1", "file2"]]
16            mock_find_references.return_value = [["file1", "symbol1", "file2"]]
17            mock_condense_paths.return_value = [["file1", "symbol1", "file2"]]
18            with patch(
19                "sweepai.utils.graph.Graph.topological_sort"
20            ) as mock_topological_sort:
21                mock_topological_sort.return_value = ["file1", "file2"]
22                result = self.graph.topological_sort(["file1", "file2"])
23                self.assertEqual(result, ["file1", "file2"])
24
25...

--------------------------------------------------------------------------------

> python -m unittest -v sweepai/utils/graph_test.py

Traceback (most recent call last):
File "/repo/sweepai/utils/graph_test.py", line 35, in test_extract_first_degree
  self.assertEqual(result, ["file1", "file2"])
AssertionError: Lists differ: [] != ['file1', 'file2']
Second list contains 2 additional elements.
First extra element 0:
'file1'
- []
+ ['file1', 'file2']
======================================================================
FAIL: test_paths_to_first_degree_entities (__main__.TestGraph)


It looks like there's an edge case when there are empty strings in the path. Let
me fix that.



sweepai/core/graph_child.py

--------------------------------------------------------------------------------

167class Graph(BaseModel):
168    ...
169
170    def paths_to_first_degree_entities(self, file_paths: list[str]):
171-        return "\n".join(
172-            [self.extract_first_degree(file_path) for file_path in file_paths]
173-        )
174+        paths = [self.extract_first_degree(file_path) for file_path in file_paths]
175+        if paths and paths[-1] == "":
176+            paths = paths[:-1]
177+        return "\n".join(paths)
178
179

I made a Pull Request for you at #2380!

See example tickets, handled by Sweep

  Example Sweep tickets

What our users are saying

CE

Cody Ebberson

Co-founder + CTO at Medplum 

The true ingenuity lies in their many product intricacies ... from adhering to
coding standards and integrating with static analysis, to responsive feedback.
Their commitment to getting every detail right, paired with a swift
incorporation of lessons into tangible enhancements, really sets Sweep AI apart.

AS

Ashaya Sharma

Co-founder + CTO at Honeycomb.ai 

I was able to get it to make a PR which took my junior developer 2 days to do.
Been great so far.

ER

Eduardo Reis

AI @ Stanford AIMI 

✨Wow! Just found sweep.dev from @wwzeng1  @KevinLu45010771 . It wrote these two
PRs for edreisMD/plugnplai#91  edreisMD/plugnplai#75  Total lifesaver 🙌. Sweep
just saved me 6 hours of work.

SP

Sagar Patil

Product Manager, SSL Zen 

Sweep helped me fix 2 issues in less than 10 mins. This would have took me at
least 30-45 mins manually. I also have to say everything is very fast now. It's
working great, just one message and it intelligently understands the problem and
suggests a fix that just works! Kudos to you guys!

KG

Kunal Gupta

CEO of Withfriends 

It’s a little bit like having a junior intern, which doesn’t sound like a lot at
first, but you can run like 100 junior interns at once and they can cover a lot
of ground in parallel.

JE

Jeremy Evans

Co-founder + CTO at savvy 

Holy crap, I'm seriously impressed 🤯. Other than one issue it seems to be
word-perfect. Exactly how I'd write it, and it understands all our
company-specific concepts. Very impressive! 🙌

Develop at ease, with Sweep

  Get Started
GithubDiscordDocsSweep ProBacked by

© 2023 Sweep AI, Inc.
sweep.dev Open in urlscan Pro 52.58.254.253 Public Scan

Form analysis 0 forms found in the DOM

Text Content

sweep.dev Open in urlscan Pro
52.58.254.253 Public Scan

Form analysis
0 forms found in the DOM