Housesigma Collector
v1.0.0HouseSigma 매물 데이터를 자동 및 수동으로 수집해 Hauscout SQLite DB에 저장하고 크론잡으로 주기적 업데이트를 지원합니다.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The name/description (collect HouseSigma data into Hauscout SQLite DB) matches the runtime instructions: cd into a local project, run scripts/collect.ts, and insert into hauscout.db. The use of sqlite3 and browser DOM extraction is coherent with the purpose. However the SKILL.md hard-codes absolute user-local paths (/Users/kendrick/...), which makes it tied to a specific machine and may not apply to other users.
Instruction Scope
Instructions tell the agent to read and update a local SQLite DB, run a TypeScript script via npx tsx, and then git add/commit/push the DB so it becomes available via Vercel. Automating git push of a database can cause unintended data exfiltration; the manual-collection instructions also direct DOM scraping and direct INSERTs into the DB. The steps are within the stated purpose, but the DB commit/push and lack of guidance about what is safe to push are risky.
Install Mechanism
There is no explicit install spec (instruction-only). The runtime commands rely on npx tsx and sqlite3/git being present. Using npx will fetch and run packages from npm at runtime which can execute arbitrary code — this is expected for running a local TypeScript script but is a noteworthy risk if you haven't audited scripts/collect.ts and package dependencies.
Credentials
The skill declares no required environment variables or credentials, which is consistent. However, the workflow implicitly uses system git credentials (for git push), network access to npm (npx), and possibly browser automation credentials/session state. Requiring a commit & push of the DB is disproportionate to mere collection unless the user explicitly intends to publish that DB; it may leak sensitive data if present.
Persistence & Privilege
always is false and the skill is instruction-only with no install; it does not request permanent presence or modify other skills. Autonomous invocation is allowed (platform default) but not a special privilege here. The main privilege is being asked to operate on local files and run commands when invoked.
What to consider before installing
This skill appears to do what it says (collect HouseSigma listings into a local Hauscout SQLite DB), but review several things before installing or running it: 1) Inspect scripts/collect.ts and package.json in the referenced project to ensure the code is safe and dependencies are trusted — npx tsx will fetch/execute code. 2) Do not commit and push the SQLite DB unless you are sure it contains no sensitive data; consider adding data/hauscout.db to .gitignore or using a scrubbed export for publishing. 3) The SKILL.md uses absolute paths tied to a specific user; update paths to match your environment. 4) Be aware that running the script will use your git credentials and network access; if you need to avoid accidental remote pushes, run in an isolated environment or remove the git push step. 5) Check HouseSigma's terms of service and robots/rate-limiting policies before scraping. If you can provide the actual scripts/collect.ts (and package.json) or a vetted install spec, I can give a higher-confidence assessment.Like a lobster shell, security has layers — review code before you run it.
latest
HouseSigma Collector Skill
HouseSigma에서 매물 데이터를 수집하여 Hauscout SQLite DB에 저장하는 skill.
프로젝트 경로
- Hauscout:
/Users/kendrick/projects/hauscout - DB:
/Users/kendrick/projects/hauscout/data/hauscout.db - 스크립트:
/Users/kendrick/projects/hauscout/scripts/collect.ts
사용법
자동 수집 (검색 프로필 기반)
cd /Users/kendrick/projects/hauscout && npx tsx scripts/collect.ts
특정 매물 수집
cd /Users/kendrick/projects/hauscout && npx tsx scripts/collect.ts --url "<housesigma_url>"
특정 프로필만 수집
cd /Users/kendrick/projects/hauscout && npx tsx scripts/collect.ts --profile <id>
브라우저 창 보면서 수집 (디버깅)
cd /Users/kendrick/projects/hauscout && npx tsx scripts/collect.ts --headed
수동 수집 (브라우저 직접 사용)
Playwright 스크립트 대신 Clawdbot 브라우저로 직접 수집할 때:
- 브라우저로 HouseSigma 매물 상세 페이지 열기
- DOM 스냅샷에서 데이터 추출
- SQLite에 직접 INSERT
DOM 데이터 매핑
HouseSigma 상세 페이지의 구조:
- 주소/상태:
<h1>태그 (Unit X - Street - Municipality - Community) - 가격:
<em>태그의$ X,XXX패턴 - Key Facts:
<dt>/<dd>쌍 (Tax, Property Type, Maintenance, etc.) - Details: 같은
<dt>/<dd>패턴 (Bedrooms, Bathrooms, etc.) - 방 정보: "Metres" 섹션의 텍스트 패턴
- Estimates: SigmaEstimate, Estimated Rent, Rental Yield
- 학교: Catchment Schools 섹션
- 인기도: "Popularity : XX/100" 텍스트
- 커뮤니티 통계: Community Statistics 섹션
검색 프로필 관리
프로필 추가:
cd /Users/kendrick/projects/hauscout
sqlite3 data/hauscout.db "INSERT INTO search_profiles (name, center_lat, center_lng, radius_km, property_types, price_min, price_max, beds_min, beds_max, baths_min, baths_max, is_active) VALUES ('이름', lat, lng, radius, '[\"Condo Apartment\"]', 0, 800000, 2, 3, 1, 2, 1);"
현재 프로필 확인:
sqlite3 data/hauscout.db "SELECT * FROM search_profiles;"
데이터 수집 후
수집 후 대시보드에 반영하려면:
cd /Users/kendrick/projects/hauscout
git add data/hauscout.db
git commit -m "data: daily collection $(date +%Y-%m-%d)"
git push
Vercel 배포 시 자동으로 최신 데이터가 반영됩니다.
크론잡
Clawdbot cron으로 매일 오전 9시에 자동 수집:
- 스크립트 실행 → DB 업데이트 → git commit & push
주의사항
- HouseSigma rate limiting 방지를 위해 요청 간 2-4초 간격 유지
- headless 모드에서 차단될 수 있음 → --headed 옵션으로 확인
- 검색 프로필의 결과가 많으면 시간이 오래 걸림 (매물당 ~5초)
Comments
Loading comments...
